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THE IDENTIFICATION OF STRUCTURAL CHARACTERISTICS! 
By T. C. Koopmans AND O. REIERSOL 
Cowles Commission for Research in Economics 
1. Introduction. 


1.1. “Population” versus “structure.” In a fundamental paper (Fisher, [1]) 
R. A. Fisher distinguished as the first group of problems in mathematical statis- 
tics the “specification of the mathematical form of the population from which 
the data are regarded as a sample.” It is the purpose of this article to suggest a 
reformulation of the specification problem, appropriate to many applications 
of statistical methods, and to point out the consequent emergence of a new 
group of problems, to be called identification problems. 

In many fields the objective of the investigator’s inquisitiveness is not just 
a “population” in the sense of a distribution of observable variables, but a 
physical structure projected behind this distribution, by which the latter is 
thought to be generated. The word “physical” is used merely to convey that 
the structure concept is based on the investigator’s ideas as to the “explanation” 
or “formation” of the phenomena studied, briefly, on his theory of these phe- 
nomena, whether they are classified as physical in the literal sense, biological, 
psychological, sociological, economic or otherwise. Examples of such structures, 
drawn from the fields of economic fluctuations and of psychological factor 
analysis, are given in sections 3 and 4. More detailed discussions of these exam- 
ples can be found in other publications by the present authors and by others 
[15], [19]. In this article, we are therefore not concerned with the merits of par- 
ticular assumptions entering into the specifications considered. Our examples 
are used only as the basis for a generalizing formulation (Section 2) and a com- 
parative discussion (Section 5) of the identification problem, i.e., the problem 
of drawing inferences from the probability distribution of the observed variables 
to the underlying structure. The belief is here expressed that this is a general 
and fundamental problem arising, in many fields of inquiry, as a concomitant 
of the scientific procedure that postulates the existence of a structure. 

The general formulation of the identification problem in Section 2 is, there- 
fore, held abstract. Some readers may prefer to give substance to the various 
concepts by reading Sections 3-4 alongside Section 2. In addition, we insert 
here a simple example showing the main features of the identification problem. 

1 To be included in Cowles Commission Papers, New Series, No. 39. The authors reported 
on this study in papers before the Berkeley meeting of the Institute of Mathematical 
Statistics in June 1948. We are indebted to Dr. G. Rasch of the University of Copenhagen 
and to Professor L. L. Thurstone of the University of Chicago for many fruitful discussions 


on the subject matter of this article, for which the responsibility lies exclusively with the 
authors. 
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1.2. A simple example of the identification problem. This example is concerned 
with the problem of estimating the parameters a, 8, of a linear relationship 


(1.1) m=at Bm 


between two variables 7; and 2 both of which are observed only subject to 
errors of observation uw, and w.. Thus, observations are available only for the 
variables 


(1.2) Yi = nit Ui where E(u;) = 0, P=is 


? 

The question under what conditions a consistent estimate of 8 exists has 
repeatedly attracted attention. To discuss this question, we shall consider a 
model in which 7 is independent of (uw; , w2) and in which the joint distribution 
of wu, and wu, is normal. 

If also the distribution of 7; is normal, it is easy to see that 8 cannot be deter- 
mined from a knowledge of the joint probability distribution of the observed 
variables y; and y2.’ In this case the joint distribution of y; and y: is also normal 
and the distribution is completely characterized by five parameters, E(y;), 
E(y2), var (yi), var (y2), and cov (y:, y2). The parameters 8 and var (m1) may 
now be chosen in any way such that the second term in the right hand mem- 
ber of 


~ (y1) cov (yi, “ _ } | so ve (u;) cov (u, * 


cov (yi, Ye) var (Ye) cov (u,,U2) var (us) 


is a positive definite matrix. It is clear that if the left hand member is non- 
singular, this condition can be met for any arbitrary value of 8 combined with 
a sufficiently small value of var (7). 

It can be shown that @ is uniquely determined by the joint probability dis- 
tribution of y; and ye if this distribution is not normal. We shall prove this in 
the case that certain semi-invariants exist.” 

Let ¢y,,.(t:, t2) denote the characteristic function of the joint distribution 
of y; and ye 


(1.3) Pyro (tr ? to) = ern, 
and let 


(1.4) Yury lh ’ t») i log dyrys (ti ’ ta). 


Similar notations will be used for the characteristic functions of other random 
variables, and the logarithms of these functions. 
Since (uw; , v2) and (m , 72) are independent, we obtain 


(1.5) Vorve (ls ? to) — Warne (hs ’ ts) + Wurus (ty ’ ts), 

2 See [13], middle of page 70. 

3 The following proof is analogous to that given by Geary [8] in the case when the u’s 
are not supposed to be normally distributed, but independent. 
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and from equations (1.1) and (1.3) we obtain 
Gnina(ta y te) = E(erits etonvitay 

= ""6,,(4 + Bb), 

or 

(1.6) Ving = atl, + alt + Bee). 

Combining (1.5) and (1.6), we have 

(1.7) Vurue(h , te) = atte + Yl + Ble) + Wurue(t , &), 


where Wu,u.(t , t2) is a polynomial of second degree, since the joint distribution 
u, and wz is normal. Let «,, be the semi-invariants of the distribution of (y; , y2) 
and let x, be the semi-invariants of the distribution of 7, . Comparing coefficients 
in equation (1.7), we obtain 















































(1.8) Krs = B' Kr+e (r + § = 3) 
and from this equation again 


(1.9) 





(r+s>3,s > 1). 


Krg = Bkr41,s—1 





If at least one x,, with r + s > 3, is finite and different from zero (which 
implies that the joint distribution of y; and ys is not normal), 8 may be deter- 
mined from one such equation given the joint distribution function of y; and y2 . 

1.3. Remarks on the history of the identification problem. The identification 
problem has been discussed, in various terminologies and formulations, by 
quantitative thinkers in several fields. It is interesting to note that most of the 
contributions have come from researchers whose main attention was directed 
to particular fields of application. For this reason, perhaps, its general formula- 
tion was not attempted until recently. 

In economics, contributions of increasing explicitness and generality were 
made by Pigou [18], Henry Schultz [20], Frisch [3], [4], [5], [6], [7], Marschak [17]. 
The main contributions to the formalization and explicit mathematical analysis 
of the problem were made so far by Haavelmo [9], Koopmans and Rubin [15], 
Wald [24], and Hurwicz [10]. 

In his books on factor analysis [21], [22], Thurstone discusses in several places 
questions of identifiability. Previously the lack of identifiability in a certain 
factor analysis model had been demonstrated by numerical examples by G. H. 
Thomson [27]. Models used in the analysis of latent structure in attitude and 
opinion research by Lazarsfeld [16] give rise to similar identification problems. 
In biometrics, the ‘‘method of path coefficients” of Sewall Wright [25], is essen- 
tially a method where a structure is postulated behind the observable distri- 
bution, and the identifiability of that structure discussed. The identification 
problem is also met with in the theory of the design of experiments, particularly 
in the method of confounding (Fisher [2], Chapter 7, Yates [26]). When con- 
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founding is used, the identifiability of certain parameters (second order inter- 
actions, say) is sacrificed in order to gain certain advantages in the testing of 
hypotheses concerning (and in the estimation of) the parameters that remain 
identifiable (main effects and first order interactions, say). 


2. General formulation of the identification problem. 


2.1. Latent variables, observed variables, and structure. In each of the examples 
considered in this article, the distributional specification applies directly to 
certain non-observable or in any case non-observed variables, variously referred 
to as errors of observation (like uw; and uw. above), disturbances, ‘‘true’’ variables 
(like »; above), specific factors, etc. We shall refer to these as latent variables, 
denoted by a vector uw. In addition, certain structural relationships—iike (1.1) 
and (1.2)—are specified which connect the latent variables with the observed 
variables, denoted by a vector y. The specification is therefore concerned with 
the mathematical forms of both the distribution of the latent variables and the 
relationships connecting observed and latent variables. 

The term “mathematical form” carries a suggestion of parametric specification 
which obviously is not the only possible type. We shall therefore employ terms 
and concepts introduced by Hurwicz [10] which cover both parametric and non- 
parametric specifications. By a structure S = (F,) we understand a particular 
probability distribution function 


(2.1) F(u) 


of the latent variables—thought of, if you wish, as given numerically to a 
desired degree of accuracy, either by a cumulative distribution surface or curve 
or table, or parametrically by numerical values of the parameters—combined 
with a particular structural relationship (or set of simultaneously valid rela- 
tionships) 


(2.2) o(y, u) = 0 


between observed and latent variables—again given numerically by curves, 
surfaces or parameters—which permits unique determination of the observed 
variables y from the values of the latent variables u (except possibly for a set 
of u-values occurring with probability zero). The corresponding probability 
distribution 


(2.3) H(y | S) 


of the apparent variables is therefore uniquely determined by the structure S, 
and is said to be generated by S. 

2.2. Specification of a model. We shall use the term model to signify a set of 
structures. We can thus say that the specification problem is concerned with 
specifying a model* © which by hypothesis contains the structure S generating 
the distribution H of the observed variables. 








4A set will be denoted by a German character corresponding to the Latin character 
denoting its representative element. 
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As a result of this reformulation of the specification problem, a new problem 
of inference arises, which logically precedes all problems of estimation or of 
testing hypotheses. It has already been deduced from the definition of structure 
that a given structure S generates one and only one probability distribution 
H(y| 8S) of the apparent variables. However, statistical inference from any 
number of observations can relate only to characteristics of the distribution of 
the observed variables. The limit of statistical inference is an exact knowledge 
of this distribution function, a limit not attainable but approachable if very 
large samples can be taken. Anything not implied in this distribution is not a 
possible object of statistical inference. 

2.3. Identifiability of structural characteristics by a model. It is therefore a 
question of great practical importance whether a statement converse to the one 
just made is valid: can the distribution H of apparent variables, generated by a 
given structure S contained in a model ©, be generated by only one structure in 
that model? This is by no means implied in the definitions given, and it is not 
generally true. Whether or not it is true in a particular instance depends—as 
illustrated in our examples—always on the model ©, and often on the given 
structure S besides. If it is true, we shall say that the model S identifies the given 
structure S, or that the structure S is identifiable by the model.’ 

If a structure S is not identifiable by a model S, some of its characteristics 
may still be uniquely determinable. By a structural parameter 6(S) we under- 
stand a functional of the structure S. (This definition applies, of course, equally 
to the case of non-parametric specification of the functions F, @ defining the 
structure.) We further define that two structures S and S* are (observationally) 
equivalent if they generate the same distribution of observed variables, 


(2.4) H(y| 8S) = H(y| S*) for all y. 


We then say that a model © identifies a parameter @(S) in a structure Sp, 
if that parameter has the same value in all structures Sp , contained in S and 
equivalent to So. This definition can obviously be extended to characteristics 
x(S) of a structure S, other than parameters, such as the functional form of a 
relationship represented by a component of the vector 4, etc. 

2.4. The identification problem. It has now become clear that our reformulation 
of the specification problem has given rise to a new group of identification prob- 
lems: to determine which of the parameters or other characteristics of a given 
structure are identifiable by (or “within’’) a given model. 

It is perhaps premature to attempt assigning to identification problems a 
definite place in a classification of statistical problems such as was undertaken 
by Fisher. One might regard problems of identifiability as a necessary part of 
the specification problem. We would consider such a classification acceptable, 
provided the temptation to specify models in such a way as to produce identifi- 
ability of relevant characteristics is resisted. Scientific honesty demands that 





*The concept here designated briefly as ‘identifiability’? has been called ‘unique 
identifiability”’ in another context (Koopmans and Rubin [15], also Hurwicz [10]) in con- 
trast with “‘multiple’’ or “‘incomplete” identifiability. 
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the specification of a model be based on prior knowledge of the phenomenon 
studied and possibly on criteria of simplicity, but not on the desire for identifi- 
ability of characteristics in which the researcher happens to be interested. 

Identification problems are not problems of statistical inference in a strict 
sense, since the study of identifiability proceeds from a hypothetical exact 
knowledge of the probability distribution of observed variables rather than 
from a finite sample of observations. However, it is clear that the study of 
identifiability is undertaken in order to explore the limitations of statistical 
inference. 

2.5. Identifiability is subject to statistical test. Further interpenetration of the 
pre-statistical analysis of identifiability with problems of statistical inference 
proper arises from the fact, amply illustrated by our examples, that the identi- 
fiability of a structural characteristic x(S) often depends not only on the model, 
but also on the given structure S. Thus, each structural characteristic x divides 
the model © exhaustively into two mutually exclusive subsets of structures 


(2.5) C= ©, + S, 


(of which one may be empty), such that x(S) is uniquely identifiable in S) by 
the model if So belongs to ©, , and not uniquely identifiable if So belongs to S, . 
We shall call x(S) unzformly identifiable by S if S, coincides with S. 

The subdivision of S into S, and ©; has an important property: If So belongs 
to S, , then all structures St equivalent to Sp also belong to S, , and a similar 
statement holds for ©, . This property follows directly from the definition of 
identifiability of x(S) given above. Its meaning is that the identifiability of 
x(S) in So depends only on the distribution of H(y) = H(y| So) of observed 
variables generated by S,). To the subdivision of the model corresponds an 
exhaustive subdivision 


(2.6) © = §, + $x 
of the set 
(2.7) § = SS) 


¢ 


of all distribution functions H(y | S) generated by the structures S of ©, into 
the subset , containing those distribution functions H(y | S) generated by 
structures S in which x(S) is uniquely identifiable, and the subset x containing 
functions H(y | S) generated by structures for which the opposite is true. 
Hence, whenever the identifiability of x(S) cannot be decided in the same 
sense (affirmatively or negatively) for all structures S of S as a result of either 
S, or Sy being empty, then the identifiability of the characteristic x(S) ef 
the structure S generating the observations is a property of the distribution 
H(y | S) of the observations. This identifiability is equivalent to the hypothesis 


(2.8) H(y! 8S) belongsto ,, 
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which is in principle® subject to statistical test under the maintained hypothesis 
(2.9) H(y|S) belongsto §. 


2.6. Testing particular specifications. Often the model is defined by one general 
specification supplemented with a number of particular specifications which are 
“detachable pieces’ in the sense that they can be removed, added or replaced 
by alternatives to construct alternative models. We may define the general 
specification as a set S of structures which is postulated to contain the model ©’ 
in question as a subset. Particular specifications can then be defined as subsets 


G,, Ge, +--+ of S of which the model ©’ is the intersection 
(2.10) S=SNSNSN---. 


An example is that of parametric specification of the “form” of the functions 
¢(y, u) defining the structural relationships and of the distribution function 
F(u) of latent variables as the general specification, and specifications of the 
values of certain parameters of ¢ and F as particular specifications. 

In such situations, it is an important question whether a given particular 
specification is—again in principle—subject to statistical test. Whenever the 
answer depends on the other particular specifications, we may ask further which 
minimum set of other particular specifications must (together with the general 
specification) be entered into the “maintained hypothesis” in order that that 
given particular specification be subject to statistical test. A formal answer to 
this question, facilitating specific answers in each concrete case, can be given 
as follows. 

Let a model © be narrowed down to an alternative model 


(2.11) SG’ =6NnsG, 


by a particular specification ©,. This particular specification will be called 
observationally restrictive if the set 5(S’) of all distribution functions H(y | S’) 
of observed variables generated by the structures S’ of GS’ is a proper subset 
of the set 6(G’) of all distribution functions H(y | S) generated by the structures 
Sof S. A statistical test of the particular specification ©, can then be constructed 
by choosing as the hypothesis subject to test 


(2.12) H(y) belongsto $(S’), 
and as the maintained hypothesis 
(2.13) H(y) belongsto (©). 


The particular specification ©, remains subject to test if the model © is stripped 
of such other particular specifications which are not necessary for the observa- 
tionally restrictive character of S; , although of course the outcome of the test 
may become either less or more certain as a result. 





®See sub-section 2.7 below. 
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A frequent case of an observationally restrictive specification is that where a 
parameter @(S) already identifiable in almost all structures S of ©, is restricted 
by ©, to a prescribed value (or to a prescribed point set not containing all 
points of its domain for all S of ©). In this case, the specification in question 
has been called overidentifying. 

2.7. Remarks on the testing of hypotheses. In subsections 2.5 and 2.6 we have 
without further inquiry applied the expression “hypothesis in principle subject 
to test” to any hypothesis which narrows down the set © of distribution fune- 
tions H generated by structures of the model to a proper subset ’. It will be 
clear that, to make a test actually possible, 5’ cannot be allowed to be every- 
where dense in §. For instance, if © is defined parametrically, a hypothesis 
restricting ’ to rational values of the parameters is clearly not subject to statis- 
tical test. Just what set-theoretical requirements on §’ are needed to make a 
test possible is a separate problem which we shall not attempt to discuss. 

We have also in another sense oversimplified the problem of testing particular 
specifications. In practice this problem presents itself as the choice of one out 
of many possible combinations of several particular specifications, rather than 
a number of separate and unconnected choices between the rejection and the 
adoption of each particular specification under consideration. Present theory 
of choice between two alternatives does not meet this situation. 


3. An econometric example.’ 


In econometric studies’ economic fluctuations have been described by a system 
of difference equations in (observed) economic variables y, subject to two kinds 
of outside influences, emanating respectively from (observed) exogenous—i.e., 
non-economic—variables z, and from (latent) random disturbances u. Each of 
these equations is given a definite meaning in terms of economic behavior. There 
may for instance be equations explaining respectively consumption expenditure 
(from incomes of various groups, price changes, etc.), the supply of consumers’ 
goods (from price margins between such goods and their raw materials and labor, 
productive capacity, etc.), investment expenditure, the supply of capital goods, 
etc. The purpose of the identification discussion is to investigate whether, on 
the basis of given a priori knowledge as to the form of these equations, and in 
particular as to what variables occur in any designated equation, procedures of 
estimation or testing of hypotheses can be directed to the parameters of the 
equations of economic behavior themselves, rather than to the parameters of 
“secondary” equations dependent on (derivable from) two or more of the be- 
havior equations. 

In the case of linear systems of equations, a possible form for the general 
specification (the model ©) is as follows. 


(3.1) Boy’) + Biy’(t( —1) + --- + By’ (t — tmex) + Te’(t) = uw’ (b) 


7 For an expository discussion of identification problems in econometric models see [14] 
6 See, for instance, J. Tinbergen [23] and L. R. Klein [12]. 
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represents the structural relationships. Here y’(¢), 2’(t), u’(t) are column 
vectors (the transposes of row vectors) of G, K and G elements, respec- 
tively, for each discrete time point or periodt = 1, 2,--- , T, alsot = 0, 
—1,--:,1— tmax,for y(t). Bo, Bi, ---, B,,,,, are square matrices of 
order G, and T is a matrix of G rows and K columns. 

(3.2) Bo is non-singular. 


(3.3) The observed values z(t), ¢ = 1,---, T, are held constant in repeated 
samples, and the components of z(t) are linearly independent. 


(3.4) The components of u(t) have a joint distribution function F(u) (with 
zero means and finite variances) which is independent of ¢ and of z(t). 


(3.5) u(t) and u(t’) are independently distributed if t ¥ @’. 


Particular specifications ©, , G2, --- , that have been most frequently em- 


ployed indicate prescribed values (usually zero) of specified elements of the 
matrix 


(3.6) A= [Bo Bi-:: a T} 
or of given linear functions of the elements of the g™ row a(g) of A, for each 
value g = 1, --- , Gof g. It can always be arranged that of the linear restrictions 


on any one row of A, at most one is non-homogeneous (normalization rule), the 
others homogeneous. The homogeneous restrictions state which variables enter 
into each equation, and possibly with which ratios between some of their co- 
efficients. 

It has been shown [15] that in the model ©, a necessary and sufficient condi- 
tion for the equivalence of two structures S = {F(u), A} and S* = {F*(u*), A*} 
is that they are connected by a linear transformation 


(3.7) A* = TA, u’* = Tu’, 
with non-singular matrix T. By definition, the model 
(3.8) 6 =SNGNSN--- 


identifies a parameter a, if, whenever A and A* belong to equivalent structures 
S and S*, respectively, of S’, we have 


(3.9) 4 ok = Agk- 


In order to attain such identifiability by linear restrictions on the g row of A 
it is necessary that one non-homogeneous restriction (normalization rule) on 
the g'" row of A be specified in G’. Recalling that G represents the number of 
rows (and the rank) of A, it can be proved that it is further necessary for the 
simultaneous identifiability of all elements aj, , k = 1,---, K, in the g™ row 
a(g) of A, that at least G — 1 additional non-homogeneous restrictions be im- 
posed on that row, say 


(3.10) a(g)®’(g) = 0, pi#'(9)} 2G -1, 
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where a(g) = [ag1 --- agx], the ®(g) are given matrices (often with elements 
0 or 1 only), and p(X) denotes the rank of X. These restrictions (3.10) are also 
sufficient (in addition to the normalization rule) if 


(3.11) p{Ae’(g)} = G@—1. 


The s” row of the “rank criterion matrix” A®’(g) in (3.11) consists of zeros only, 
because of (8.10). Therefore, (3.11) requires the other rows of that matrix to 
be linearly independent.’ 

Thus, even if the model ©’ includes, besides a normalization rule, the neces. 
sary condition (3.10) for the identifiability of the g'* behavior equation, such 
identifiability is still absent in certain structures, corresponding to a point set 
(generally of measure zero) in the space of the coefficients of the remaining equa- 
tions, viz., the point set in which (3.11) is not satisfied. Whether or not A actually 
falls within this point set is, as was stated before in more general terms, a prop- 
erty of the joint distribution function H(y|z) of the observations y, and is 
therefore subject to statistical test. In the present case, this is also seen from 
the fact that the rank of A®, is preserved by the transformation (3.7), and is 
therefore itself an identifiable parameter. 

For certain scientific purposes explicit knowledge of A is unnecessary. One 
such purpose is ‘‘prediction without change in structure,” i.e., prediction of a 
value of y(t) for a future time ¢ from a hypothetical value of z(t) on the assump- 
tion that A and F(u) have not changed between the observation period and the 
time point to which the prediction applies. Such prediction can be based on 
the knowledge of (a) the population regressions 


(3.12) y’@ = Thy’@ — 1) + +++ + a y/(t — tmx) + Me2’() + vo’) 

of the “jointly dependent” variables y(t) on the “predetermined” variables 
y(t — 1),--:, y(t — Tmax), 2(t) and of (b) the distribution function K(v) of 
the population residuals 

(3.13) v(t) = y®) — Elty® | y@ — 1), --- , yt — tmax), 2(0)} 


from these regressions. Of course, the matrices “II” are functions of the struc- 
tural parameters (3.6) through 


(3.14) [-—Z7 Tl] = [-I7 1, --- 1 


I.] = —Bo A 


and K(v) can be derived from F(u) through the transformation 
(3.15) vo’ = Bow’. 


The important fact is that II and K(v), by their definitions, depend only on the 
distribution function H(y | z) of the observations, and are therefore uniformly 
identifiable. This is also reflected in the fact that the right hand members of 
(3.14) and (8.15) are invariant for the transformation (3.7). 


9 In that case, overidentification of a(g) will result if the inequality sign in (3.10) holds. 
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However, the most relevant economic problems are those in which a change 
in A or F(u) is actually or hypothetically present, and in which therefore the 
identifiability of the relevant parts or functions of A and of the characteristics 
of F(u) requires separate inquiry.” 


4, An example from factor analysis." Factor analysis has been presented in 
different forms by different authors. We shall here consider the multiple factor 
analysis of Thurstone only [21], [22]. 

The factor analysis methods were developed primarily for the purpose of 
analyzing intelligence tests, but they have also been used for other psychological 
problems and in other sciences. 

Suppose that a person is given a battery of G tests. Let his score in test 7 
be y;. The fundamental assumption in factor analysis is that these scores can 
be explained in terms of a relatively small number of hypothetical primary 
factors. Let 21, 22, °°: , 2) denote the hypothetical scores of the person in the 
common factors, i.e., those primary factors which are common to at least two 
tests in the battery. We assume that y; is a homogeneous linear function of 
the scores z plus a unique part v;, which may be thought of as consisting of 
an error term plus the contribution of a specific factor. The coefficients 7, in 
the linear function just mentioned are called factor loadings. The factor loading 
mx, expresses the relative importance of the common factor k in the answering 
of test 7. 

We shall introduce the row vectors y = [y,], z = [z,], v = [v,] and the matrix 
Il = [xx]. The covariance matrices of the sets of variables y, z, and v will be 
denoted by M,, , M.: , and A, respectively. 

In contrast with the preceding example, the variables y are the only observed 
variables. The variables v and z are latent variables. 

Our model will be given by the following specifications: 
















(4.1) y’ = Tz’ + v’. 
(4.2) E(z) = 0 and E(v) = 0. 






(4.3) The set of variables z is stochastically independent of the set of variables v. 












1 See Hurwicz [11]. 

1 Proofs of the statements in this section will be found in a separate paper by one of 
the authors (Reiers¢l [19]). It should be noted that the notation is different in the two 
papers. In the separate paper the notation is close to that of Thurstone. In the present 
paper the notation has been chosen to correspond in some way to the notation in the econo- 


metric example. A list of corresponding symbols in the present paper and in Thurstone’s 
books follows: 






Present paper: yi 2 mk G p My, M:. A 











Thurstone: 8; Im Qin n T Ry Rog Ri—R 





It should be noted that M,, , M.: , and A are covariance matrices of the original variables. 
while R, , Rpg , and R are covariance matrices of standardized variables. 
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(4.4) Ais diagonal and different from 0. 

(4.5) The elements of z and v are jointly normally distributed. 
(4.6) Each y; is correlated with at least one of the other y’s. 
(4.7) The rank of II equals the number p of its columns. 

(4.8) M.-,is nonsingular. 


(4.9) pis the smallest number of variables z which is compatible with the joint 
probability distribution of the observed variables y and specifications (4.1)- 
(4.8). 


(4.10) Each column of II contains at least p zeros (in unspecified places). 


(4.11) A normalization rule fixing the units of the variables x and a rule fixing 
the order of the columns of II. 


Denote by II; the matrix consisting of all the rows of Il which have a zero in 
the k™ column. Let the number of rows in the matrix I, be p, . Let Ij; denote 
the submatrix of I, which we get when deleting the i” row of I, . Using these 
notations we shall formulate the final specification of our model. 


(4.12) The rank of each of the matrices II;,; (k = 1,2, --- ,p3;2 = 1,2, -+- , m) 
isp — l. 


Specification (4.1) represents the structural relationships. 

Specification (4.10) means that the experimenter thinks he can construct a 
sufficient number of tests where at least one of the common primary factors is 
absent. 

We shall first consider a model S containing Specifications (4.1)—(4.9) only. 
From (4.9) follows that p is uniformly identifiable. 

Let po = 3(22G + 1 — V8G + 1). If p > pc, the matrix A is generally not 
identifiable. If p < pe, A generally is identifiable. When p = pg, the number 
of values of A, which correspond to a given covariance matrix My, , is usually 
finite, and may be equal to one or greater than one. The matrices II and Mz 
are never identifiable in the model ©. If A is identifiable, the set of all struc- 
tures {II*, M?. , A} equivalent to the structure {II, M.., A} is given by the set 
of all matrices 


(4.13) * = Iw 


and 


(4.14) Mz. = W'M..(W’)’, 


where Y is any square, p-rowed and nonsingular matrix. 
In the following we shall confine our discussion to the case p < pg, and to 
structures in which the matrix M,, is such that A is identifiable in ©. 
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We shall now consider the model ©’ defined by Specifications (4.1)—(4.11). 
In this model a necessary and sufficient condition for the identifiability of II is 
that any square p-rowed minor of II which is of rank p — 1 is contained in one of 
the matrices II, . This condition excludes the possibility that all elements be- 
longing to the intersection of p — 1 rows and two columns of II are all equal to 
zero. In order to be able to use this result, the experimenter would have to be 
able to construct tests where one, but not more than one, common factor would 
be absent. Therefore the result is not particularly useful. In order not to exclude 
the case where two common factors occur in more than p — 2 tests, we have in- 
troduced Specification (4.12). 

We shall finally consider the model S” defined by Specifications (4.1)—(4.12). 
Assuming M,, known, we can determine some value II* of II which satisfies Speci- 
fications (4.1)—(4.9). Since, by assumption, A is identifiable in G, II* must be 
of the form ITV, where II is the true factor loadings matrix and W is non-singular. 
Let II; be a submatrix of II* containing all the columns of II* and satisfying the 
following conditions 


(4.15) The rank of Mj is p—1. 


(4.16) The addition to 7 of a row contained in II* but not in I increases 
the rank to p. 


(4.17) Each submatrix of If obtained by deleting one row of Ty has rank p — 1. 


A necessary and sufficient condition for the identifiability of II in the com- 


ct 


plete model ©” is that there exist exactly p submatrices I; of II* which satisfy 
conditions (4.15)-(4.17), and that the p vectors q., satisfying the equations 
Iq: = 0 when k = 1, 2, --- , p, are linearly independent. 

It should be noted that Specifications (4.10) and (4.12) are observationally re- 
strictive, i.e., they are in principle subject to statistical test. 


5. A comparative discussion of the examples given. Some comparative re- 
marks on the three examples given in sections 1.2, 3 and 4 may illustrate our 
general discussion of the identification problem, given in section 2. 

In each of the three examples considered, the model contains a general speci- 
fication prescribing a parametric form of the structural relationships (2.2). 
Further particular specifications therefore take the form of parameter specifica- 
tions in the function ¢(y, u) in (2.2) and possibly in the distribution function 
(2.1) of latent variables. A comparison of the three examples shows a striking 
formal similarity of the identification problems to which they give rise. This 
similarity justifies our speaking of identification problems as a separate group 
of problems preparatory to siatistical inference, of quite widespread occurrence. 
The same definitions of structure, model, parameter, identifiability are applicable 
and useful in each example. In all three cases, parameters occur, the identifiability 
of which depends on other identifiable structural characteristics (the normality 
of a distribution function in one case, the ranks of parameter matrices in the 
other two cases). 
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Our remaining remarks will be drawn from the econometric and factor analysis 
examples only, partly because these illustrate the identification problem jp 
greater elaboration, partly because the closer similarity of these examples permits 
us to notice interesting differences in greater detail. 

Let us consider the particular case of the econometric example when there are 
no time lags between the y’s in the structural relationships, i.e., when tmax = 0) 
In this case the reduced form (3.12) in the econometric example is of the same 
form as equation (4.1), which defines the structural relationships in the factor 
analysis example. The notation in the factor analysis example has been chosen 
with this similarity in mind. However, it should be emphasized that, while the 
variables y are observed in both examples and the variables v are latent in both 
examples, the variables z are observed in the econometric example and latent in 
the factor analysis example, and even the number of variables z is an unknown 
parameter p in the latter example. For this reason, the discussion of the identifi- 
ability of A in factor analysis has no counterpart in the econometric model, 
Furthermore, the identifiability of the matrix II, which is automatic and uniform 
in the econometric model ©. , say, requires detailed specifications in the factor 
analysis model ©; , say, including the diagonality of A and prescriptions about 
the number of zero elements in each column. 

The observability of z in the econometric case is exploited to postulate, behind 
the reduced form (3.12), a structure {F'(w), A) to be identified (where possible) 
from further specifications based on economic theory. Here we meet with another 
analogy, with differences, between the identification problem of A in ©, and 
that of II (given A) in S; . In the latter problem, the set of matrices II*, belong- 
ing to a set of equivalent structures, is given by equation (4.13). This equation 
is analogous to the first of the equations (3.7) in the econometric case, with II in 
GS; now corresponding to A’ in ©.. 

If we were to specify zeros in assigned places in the factor loadings matrix II, 
and to introduce a normalization rule for each column of II, the results quoted 
in the econometric example would immediately be applicable to the factor analysis 
case. A necessary condition for the identifiability of II, given that of A, would be 
that the number of specified zeros in each column of II be at least p — 1. Necessary 
and sufficient for identifiability would be that the matrix consisting of all rows of 
II which have specified zeros in the k column, be of the rank p — 1, for each 
value of k. 

However, instead of specifying that given elements of II be equal to zero, 
Thurstone assumes that we know that there is a certain minimum number of 
zeros in each column, but that we do not know which particular elements are 
zero. The specification of a certain number of zeros in undesignated places ob- 
viously represents a weaker assumption than the specification of the same number 
of zeros in designated places. It is therefore not surprising that the specification 
of p—1 zeros in undesignated places in each column is never sufficient for identifi- 
ability of Il. Thus, in the model S; , we have introduced the stronger specification 
(4.10). We have seen that even this specification is too weak to be practically 
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useful, and have introduced the additional Specification (4.12), which makes 
the factor analysis model still more different from the econometric model. 

Continuing the analogy in which A’ in G, corresponds to II in G; , we note an 
important feature common to both examples, and present in other situations as 
well. Even if specifications sufficient, in number and variety of “points of ap- 
plication,” for the identifiability of all structural parameters cannot be derived 
from a priori considerations, it remains possible to construct uniformly identifiable 
functions of these parameters, knowledge of which constitutes scientific informa- 
tion of more limited usefulness. 

In the econometric example we have already seen that for certain purposes a 
knowledge of the uniformly identifiable matrix II of the reduced form is sufficient, 
while for other purposes we need to know the matrix A. As a further illustration, 
suppose that we want to test for persistence of the structure by comparing the 
equation systems which we estimate from data for two different periods. Dis- 
regarding errors of estimation (which are not our present topic), if A is the same 
in both cases, II will also be the same in both cases. It is therefore possible to 
arrive at a rejection of the persistence hypothesis by determining II in both cases. 
Suppose next that one row (or several rows) of A are different in the two periods, 
while the other rows of A are identical in the two cases. If By changes from one 
period to the other, we may expect each element of II to change. If we can de- 
termine A for each period, the equality (as between periods) of some of the rows 
of A will indicate precisely the extent of validity of the persistence hypothesis. 
If we cannot determine A but only II in each case, this verification will be lost. 

Similarly, it may in factor analysis be sufficient for some purposes to consider 
what we may call the reduced form of II. Let Il, be the upper square part of II 
which we shall assume to be nonsingular. The matrix A = II Ij’ will be called 


the reduced form of IT. It will be of the form | I 


II 
A is identifiable. 

Suppose now that the same battery of tests is given to two different popula- 
tions. Suppose that some of the factor loadings are different in the two popula- 
tions, while other factor loadings are the same. If at least one of the different 
factor loadings occurs in the matrix II, , then each element of A;r may be ex- 
pected to change, and the partial identity of the two structures cannot be dis- 
covered if we determine A only and not II. On the other hand, if II is the same in 
both cases, also A will be the same in both populations. 

Let us next consider two different batteries given to the same population. 
We shall suppose that the two batteries have some tests in common. For each test 
which is common to the two batteries we ought to find the same factor loadings 
in both batteries. In other words, the matrices II in the two cases ought to be 
partly identical. On the other hand, if Il; contains rows corresponding to tests 
which are not common to the two batteries, the matrices Az; will be entirely 
different in the two cases. Therefore, again, identification of II will be necessary 
to verify the equality of the factor loadings of tests common to both batteries. 


| A is always identifiable when 








180 T. C. KOOPMANS AND O. REIERS@L 


A final remark relates to observationally restrictive specifications. Particy- 
larly where the model is to a large degree speculative, empirical: confirmation of 
the validity or usefulness of the model is obtained only to the extent that ob- 
servationally restrictive specifications are upheld by the data. Thus, Thurstone 
emphasizes that the number of factors p should be well below the value p¢ found 
above to be necessary in general for the identifiability of A, before a factor analy- 
sis can be regarded as successful (Thurstone [22], p. 294). 

In econometric work, greater reliance is sometimes placed on a priori specifica- 
tion of the form of a behavior equation, particularly the variables occurring 
in it. If the linear restrictions on an equation in a linear system are just sufficient 
for its identifiability, estimation of the parameters of that equation is possible, 
but none of the identifying restrictions are themselves subject to test. Again, 
dependence on a priori information is diminished (but not eliminated) to the 
extent that a greater number of overidentifying restrictions are imposed and are 
upheld by the data. 
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SOME PROBLEMS IN MINIMAX POINT ESTIMATION 


By J. L. Hopasrs, Jr.,! anp E. L. LEHMANN 
University of California, Berkeley 


1. Summary. In the present paper the problem of point estimation is con- 
sidered in terms oi risk functions, without the customary restriction to unbiased 
estimates. It is shown that, whenever the loss is a convex function of the esti- 
mate, it suffices from the risk viewpoint to consider only nonrandomized esti- 
mates. For a number of specific problems the minimax estimates are found ex- 
plicitly, using the squared error as loss. Certain minimax prediction problems 
are also solved. 


2. Introduction. The principles most commonly applied in the selection of a 
point estimate are the principles of maximum likelihood (R. A. Fisher) and of 
minimum variance unbiased estimation (Markoff).” Both of these principles are 
intuitively appealing, but neither of them can be justified very well in a sys- 
tematic development of statistics. This holds also for some modifications of these 
principles proposed by G. W. Brown [1], as the author himself points out. 

In an important early paper [2], Wald indicated a more systematic approach 
to the problem, which he later developed into his general theory of statistical 
decision problems [8, 4, 5]. Consider a random variable X distributed over a 
space SX according to a distribution Pj with @ €Q . It is desired to estimate some 
g(@). If the value x of X is observed one makes an estimate, say f(x), and thereby 
incurs a loss of W[g(@), f()] when @ is the true value of the parameter. We shall 
assume that the loss function is nonnegative. It then follows that the expectation 
of the loss will always exist (although it may be infinite). The risk associated 
with the estimate f is defined to be the expected loss, as given by 


(2.1) R;(@) = E, Wig), f(x)| = / Wig(@), f(x)] dP¢ (zx). 
oC 


The choice of estimate should then be made according to the risk function. As a 
particular possibility Wald suggests the use of minimax estimates, i.e. estimates 
which minimize sup» R; (6). 

The main purpose of the present paper is to obtain minimax estimates for a 
number of specific problems. Only few such problems have been worked out so 
far, the emphasis in Wald’s work having been on the general theory. In [2] Wald 
obtained the minimax estimate of an unknown location parameter. Stein and 
Wald [6] treated the sequential problem of estimating the mean of a normal dis- 

1 This work was supported in part by the Office of Naval Research. 

2 Actually, the principle of minimum variance unbiased estimation goes back to Gauss. 
For discussions of the history of these ideas, see E. Czuser, Theorie der Beobachtungsfehler, 


Leipzig, 1891, and R. L. Puackert, “A historical note on the method of least squares”, 
Biometrika, Vol. 36 (1950), p. 458. 


182 








1S a 
ites 


or a 
t so 
Vald 
and 
dis- 


auss. 
ehler, 
res”, 








MINIMAX POINT ESTIMATION 183 





tribution with known variance, and in his forthcoming book Wald considers 
as an example the sequential problem of estimating the mean of a random variable 
distributed uniformly over an interval of length 1. 

It seems worthwhile to consider further special problems both because one 
may obtain estimates that in some cases are preferable to the conventional ones, 
and because these examples throw some light on the general desirability of the 
minimax principle. As we shall see below, it does not seem possible to reach 
any definite conclusions on this latter point, and to obtain a generally valid com- 
parison between the minimax estimate and, for example, the unbiased estimate 
with uniformly smallest variance (when such an estimate exists). 

Consider, for example, the problem of estimating the probability of success from 
a number of independent trials each of which may be a success or a failure, when 
the loss-function is the squared error. If the number of trials is one, the minimax 
estimate (as is shown below) is given by f(X) = 3X + 4, where X is 1 or 0 as 
the trial is a success or failure. As is easily seen, this estimate has smaller risk 
than the usual estimate f*(X) = X whenever 0.07 S p S 0.93. On the other 
hand, when the number of trials is large the standard estimate X has smaller 
risk than the minimax estimate nearly everywhere. The minimax estimate is only 
slightly better in a small interval centered at p = 4, whose length tends to zero 
as the number of trials tends to infinity, and is worse everywhere else. 

For our purpose it is convenient to formulate the problem of point estimation 
as follows (see in this connection [7]). A random variable X is distributed over a 
space {XC according to a distribution P belonging to a family “f. We wish to esti- 
mate g(P) where g is a function whose domain is and whose range is contained 
in some space ‘Y (in any example ‘Y is usually a Euclidean space, mostly even a 
one dimensional Euclidean space). An estimate is a statistic f(X) taking on 
values in “YY. We denote by W[g(P), f(x)] the loss which results from making 
the estimate f(x) when P is the true distribution, and we define the risk function 
of the estimate f by 


(2.2) R,(P) = EpW(g(P), f(X)]. 
The problem is to determine f so as to minimize supp. Ry; (P). 

Our principal tool will be the following theorem, which is essentially contained 
in Wald’s work but which is not stated there explicitly. The theorem is a slight 
modification of one used for the theory of testing in [8]. 

THEOREM 2.1. Let {Ps}, 0 €w (where w is a subset of a Euclidean space), be a 
parametric subfamily of F, and let X be a probability measure over w. Suppose that 
f minimizes 
(23) [ ZeWioPs), $2] an) 
and that 


(i) EW [g(Pe), f(X)] ts constant (say c) for all 6 €w, 
(ii) EpW [g(P), f(X)] < c for all P in &. 
Then f is a minimax estimate for estimating g. 
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Proor. Let f* be any other estimate of g. Then 





cup Ep Wig(P), f(X)] = J Be Wlg(Ps), §(X)] ano) 





(2.4) 


IA 


[ Be WigPo, $3] an) 





< sup Ep Wig(P), f*(X)]. 


Ped 














We note that if f is the unique function minimizing (2.3), then the first in- 
equality in (2.4) becomes strict, and hence f is the unique minimax estimate of g. 

Following Wald we shall call the function f that minimizes (2.3) the Bayes 
estimate of g associated with the a priori distribution \. As a corollary to theorem 
2.1, we note that a Bayes estimate whose risk function is constant, is a minimax 
estimate. 















3. Randomization. In the formulation of the problem of point estimation given 
above, the estimate f(x) is assumed to be completely determined by the observed 
value x of the random variable X. In the present section a broader formulation 
of the problem will be considered, in which the estimate corresponding to x may 
itself be a random variable, say 7, . This extension is a special case of the notion 
of randomized decision function introduced by Wald in his general decision 
theory. We associate with each x in §X a probability distribution Fz, with the 
convention that when X is observed to have the value x, we estimate g(P) by 
means of a random variable T, which is distributed according to F, . Estimates of 
this latter kind we shall call randomized, and the fixed estimates f(z) 
nonrandomized. 

The motivation behind the admission of randomized estimates (or more gen- 
erally of randomized statistical decision funtions) is that in some problems of 
statistical inference the performance of the decision function is considerably im- 
proved by randomization. It is clear however that the randomized functions are 
more complicated, and hence that it is useful to know when their consideration 
is not necessary. Before investigating this question we give the following defini- 
tion, which makes precise a sense in which certain estimates may be omitted from 
consideration. (See Wald [9]). 

DEFINITION. For a given estimation problem a class C of estimates will be 
said to be essentially complete with respect to a class D of estimates, if for every 
estimate g in D there exists an estimate f in C such that R;(P) < R,(P) for all 
P in §. If D is the class of all randomized estimates we simply say that C is 
essentially complete for the given problem. 

It is clear that if one adopts the risk function point of view, one loses nothing 
by restricting consideration to an essentially complete class of estimates. In the 
present section we find conditions under which the totality of nonrandomized 
estimates forms an essentially complete class. 
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For this purpose we need the notion of convexity. A set S in a k-dimensional 
Euclidean space is said to be convex if, whenever P and Q are in S, then all 
points on the line segment from P to Q are also in S. A real valued function y 
defined over a k-dimensional Euclidean space is said to be convex, if for any 


points (a, --- , 2%) and (y,, --- , yx) of the space, and any number 0 < a <1 
we have 


on (ts 5 ++ 










>t) + (1 — aWW(yi, °°: ye) = 
¥(am + (1 — a@)yi,-+-, at + (1 — ay). 


We use the following notation for conditional expectation. If U and V are 
two random variables which have a joint distribution, then E(U|v) denotes the 
conditional expectation of U given that V = v; E(U|S) denotes the conditional 
expectation of U given that V isin S. Let 6(v) = E(U|v); then for ®(V) we write 
E(U | V). 

Lemma 3.1. Let U, V be two random variables with a joint distribution, such that 
U ts distributed in a k-dimensional space and E(U) is finite. Let p be a real-valued 
convex function defined over this space and bounded from below. Then 


Ely{E(U | V)}} = E{y(U)}. 


Proor. The proof is immediate in the special case that, for almost all v, there 
exists a determination of the conditional probability distribution of U given v 
which is a measure. We then know, from the convexity of y, that for almost all 
values v of V,y{E(U | v)} < E{y(U) | v}. Replacing v by V and taking expecta- 
tions of both sides, we obtain the desired result. 

If we do not assume the existence of conditional measures, the proof is more 
complicated. Since E(U) is finite, there exists a function E(U | v) such that for 
any set S, E(U| S) = E{E(U | V)| 8}; see [10], p. 47. Since y is convex it is 
measurable, and since y is bounded from below E{y(U)} exists. Excluding the 
trivial case E{y(U)} = «©, we know there exists a function E{y(U) | v} such 
that for any set S, E{y(U)| S} = E{E{y(U) | V} |S}. 

If the lemma wer false, we should have E{E{y(U)|V}j} < E{y{H(U| V)}}, 
and could find an e > 0 and a set A of positive V measure such that for every 
ve A, E{y(U) |v} + 2e < W{E(U | v)}. This implies the existence of a number d 
and a set B of positive V measure such that for every v e B, E{y(U) |v} Sd 
andd + e S$ ¥{E(U | v)}. Since y is convex, the domain D of points P for which 
¥(P) < d+ eis convex, and we may find a subset C of B, of positive V measure, 
for which the set of points E(U | v), v € C, lies in a convex domain E disjoint of D. 
It follows that E(U | C) lies in E, and hence that y{H(U | C)} = d + e. Clearly 
d = E\y(U) | C}. Thus we have the contradiction E{y(U) | C} > y{E(U | C)}. 

Dertnirion. A loss function W will be called convex if for every u ¢ “, W(u, v) 
is a convex function of the estimate v. 

An example of a convex loss function is provided by the Markoff principle of 
estimation. The variance of an unbiased estimate may be considered as a risk 


(3.1) 
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function if we take the loss function to be the squared error, i.e. the square of 
the difference between the true value g(P) and the estimated value f(x) or T, ; 
and this loss function is clearly convex. 

THEOREM 3.2. If the loss function W is convex, if YY is in a Euclidean space, 
and if we consider only estimates having finite expectation, then the class of non- 
randomized estimates is essentially complete. 

Proor. Let 7'x be any randomized estimate such that E(7'x) exists and is 
finite. Applying lemma 3.1 we see that E(7'x | X), which as a function of X only 
is a nonrandomized estimate, has a risk never greater than that of T'x . 

The restriction in theorem 3.2 to estimates having finite expectation may be 
replaced by the requirement that for each ue there exist a number M,, such 
that if |v — u| = M, then W(u, v) > W(u, u). With this requirement and the 
convexity assumption, it follows that the risk associated with 7'y is infinite when- 
ever E(T'x) is infinite. 

Theorem 3.2 is related to a generalization of a theorem of Blackwell. If Y is a 
sufficient statistic for g(P), and if for almost all y the conditional distribution of 
X given y exists in the sense of measure, we may regard estimation of g(P) based 
on X as randomized estimation of g(P) based on Y; and if the assumptions of 
theorem 3.1 are satisfied, we may apply this theorem to conclude the essential 
completeness of the class of nonrandomized estimates based on Y. In the general 
case we may resort again to lemma 3.1 to prove the following theorem; the proof 
is the same as that of theorem 3.2 if X is replaced by Y throughout. 

THEOREM 3.3. If the loss function W is convex, if Y is in a Euclidean space, if 
we consider only estimates having a finite expectation, and if Y isa sufficient statistic 
for F, then the class of nonrandomized estimates which are functions of Y only is 
essentially complete. 

Blackwell [11] proved that if U is a sufficient statistic for a real-valued param- 
eter 6, and if T is an unbiased estimate for 6, then E(T | U), which is a function of 
U only and also an unbiased estimate for @, has a variance which never exceeds 
that of 7’. Observing that the theorems above hold true when we restrict attention 
to unbiased estimates, Blackwell’s result may be obtained from theorem 3.3 by 
letting “Y be one-dimensional, letting W be the squared error, and restricting 
ourselves to unbiased estimates. In a similar manner we can get from theorem 3.3 
an extension of Blackwell’s theorem given by Barankin [12], who treated the 
case in which W(6, t) = | @ — ¢|*, s > 1. It is clear that these loss functions are 
convex. 

If the convexity assumption is removed, theorems 3.2 and 3.3 cease to be 
true. For example, if XC has only n points, if © is a finite line segment of length 
greater than 2na, and if theloss is 0 whenever | g(P) — f(x) | < a, and 1 otherwise, 
then the minimax risk among nonrandomized estimates is 1. By admitting ran- 
domization, however, the maximum risk can be brought below 1 without using X 
at all; if our estimate 7’ is uniformly distributed over Y, then the maximum risk 
will be 1 — a/(length of %). 

The example just given may seem inappropriate, in that with the specified loss 
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function the problem would customarily be considered one of interval estimation 
rather than point estimation. This objection does not apply however to the loss 
functions considered in the following theorem. 

THEoreM 3.4. Let XC = {0,1,--- ,n},n > 1. Let F be the set of binomial dis- 


tributions P,, defined by P,(X = x) = (era — p)”*,0 <p <1, Let be the 


closed interval [0, 1] and g(P,) = p. Let W(p, t) = |p —t}’,0<s < 1. Then no 
minimax estimate can be nonrandomized, and the class of nonrandomized estimates 
is not essentially complete. 

Proor. For any nonrandomized estimate f, R;(p), being a sum of products of 
continuous functions of p, is itself a continuous function of p. The nonrandomized 
minimax risk is less than 1, as may be shown by considering any estimate of the 
following kind: f(0) = 0, f(n) = 1, and 0 < f(x) < 1 for all x. Here R;(0) = 
R,(1) = 0, while if 0 < p < 1, R;(p) < max, | p — f(x) |*° < 1. By continuity 
supo< p<1 R;s(p) < 1. 

It is easy to see that there exists among the nonrandomized estimates a minimax 
estimate, say h. Let the corresponding minimax risk be denoted by M. We know 
that M = supose p<1 Ri(p) < 1; it is obvious that M > 0. Observe that h(0) < 
1, since h(O) > 1 leads to the contradiction R,(0) = | h(O0) |* > 1. We can write 


Rip) = DY PAX =2)-|\p—h(@@)|'+ DY P(X =2)-|p—h(a)l’. 


h(z)=h(0) h(z)#%h(0) 





The second sum has a finite derivative with respect to p at p = h(0), while the 
first sum increases with infinite speed as p is moved away from h(0). Therefore 
R,{h(O)} < M; and by an exactly symmetrical argument, 0 < h(n) and 
R,{h(n)} < M. Using the continuity of R; , we can find a positive number w so 
small that Ri(p) < M whenever | p — h(0) | < wor| p — h(n) | < w. 

Consider now the randomized estimate 7, defined by T, = h(x) if0 <2 <n, 
and by 7. = h(x) + aY otherwise, where Y is a random variable independent of 
X and taking on the values 1 and —1 each with probability 4, and where 0 < 
a < w. Observe 


Rrx(p) — Ra(p) = (1— p)"[3{ |p — AO) + al *+|p—hO)—al*}— |p- 
hO) |‘) + pat] p — hm) +aj*+ |p — a(n) — a|*} — |p— h(n) |‘). 


By the concavity of the functions involved, the first square bracketted term is 
negative whenever | p — h(0) | > a, and the second is negative whenever 
|p — h(n) | > a. We can choose a so small that whenever either | p — h(O) | 
or | p — h(n) | is less than a, Rry(p) — Ri(p) < w. A continuity argument 
now shows that supo<p<1 Rry(p) < M. But this proves that no minimax esti- 
mate, with randomization permitted, can be nonrandomized. It is also now 
obvious that the class of nonrandomized estimates is not essentially complete: 
every nonrandomized estimate must have a risk function which somewhere ex- 
ceeds supo<p<1 Rr,x(p). 
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4. General properties of minimax estimation. Whether a principle such as the 

minimax principle is a desirable one has to be decided mainly on two criteria: 
(i) its general properties, and 

(ii) its performance in many particular instances. 

It has already been remarked that in the second respect the minimax principle 
does not seem entirely satisfactory. With regard to the former, one great ad- 
vantage of this principle is that when there is a unique minimax estimate, it is 
admissible. Here an estimate f is said to be admissible (see [3]) if there exists no 
other estimate f* such that R;(P) < R,(P) for all P in F with strict inequality 
holding for some P. It is interesting that, as we shall show below, this admissi- 
bility property is not shared by either the principle of unbiasedness or the maxi- 
mum likelihood principle. 

In this connection we begin by proving another theorem concerning essentially 
complete classes. 

THEOREM 4.1. Suppose that the space \Y is a finite interval [a, b] on the real line, 
and that for each u e Y, W(u, v) is a non-decreasing function of v when v > u 
and a non-increasing function of v when v < u. Then the class of estimates whose 
range is contained in YY is essentially complete with respect to the class of all real 
valued estimates. 

Proor. If T is any real-valued estimate, define T* by 


fT & Te, 
(4.1) T* =a if T <a, 
lb if t >b. 


It is clear that R7.(P) < Rr(P) for every P e F. 

Halmos [7] has provided an example in which the minimum variance unbiased 
estimate takes on, with positive probability, values outside the range of the 
parameter. It can be shown from the proof of theorem 4.1 that in this case any 
unbiased estimate is inadmissible, provided the loss function is of the kind 
described in theorem 4.1. 

That the maximum likelihood principle may also lead to inadmissible esti- 
mates is easy to show, since this is the case in many familiar situations. The 
following example may be of interest in that here the maximum likelihood 
estimate is uniformly worst among all estimates which one would consider 
using. 

Example. Let X be a random variable with only 0 and | as possible values, and 
let P(X = 1) = p:p. Assume it to be known that 3} < p < 3. Then the maximum 
likelihood estimate for p is easily seen to be 3(X + 1), and, if the loss function 
is the squared error, the associated risk function is }(p — 3)’ + 3g. This risk 
function is, for every possible value of p, greater than that of any estimate f(x) 
satisfying: 3 < f(0) < f(l) = 1 — fO) &S §. 

The selection of loss function in any problem should in theory be governed by 
metastatistical considerations, but in fact the circumstances of statistical prob- 
lems do not usually offer compelling reasons for using one loss function rather 
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than another. Considerations of mathematical facility are often determining. 
Thus, various classical unbiased estimates become minimax estimates when the 
loss function is judiciously chosen. For, if we take as loss function the ratio of 
squared error to the variance of the unbiased estimate, the risk becomes constant, 
and we can easily obtain the classical estimates as minimax estimates in the 
familiar binomial, Poisson, and rectangular problems, and in some of the non- 
parametric problems considered in section 6. 

However, this approach seems to be somewhat artificial, and hereafter we 
shall restrict ourselves to a single loss function, namely the squared error. There 
are two reasons for this choice. With squared error for the loss, the mathematical 
problems are rather simple. And as was remarked above, squared error (if one 
restricts oneself to unbiased estimates) is the traditional loss function. Fortun- 
ately, the squared error loss function is convex, and hence theorem 3.2 permits 
us to avoid considering randomized estimates. 

When the loss function is squared error, we have the following obvious linearity 
property, which for later reference we state as 

THEOREM 4.2. If f(X) is the minimax estimate for g(P), then af(X) + b is the 
minimax estimate for a - g(P) + b. 

However, as we shall show by an example in the next section, it need not be 
true that if X, , --- , X, are independent and f,;(X;) is the minimax estimate for 
g(P;), it = 1,---, n, then )o2,af(X;) is the minimax estimate for 
> 21 ag9:(P;). This is a definite disadvantage of the minimax principle as 
compared with the Markoff principle which does possess the linearity property 
mentioned. 

We conclude this section with an explicit solution of the Bayes problem in the 





squared error case. If the distribution P is itself a random variable distributed 
e over ‘f according to some distribution 4, we may compare estimates f by means 
y of their expected loss Q(f) = Elg(P) — f(X)]’. Since Q(f) = E{Elg(P) — f(X)I | 
1 X}, it is well known that Q(f) is minimized by using the estimate 
| f(z) = Efg(P) | x], provided the conditional measures exist. In fact, this result 
- holds even without this assumption. 

; TuroreM 4.3. Elg(P) — f(X) is minimized by f(x) = Elg(P) | 2]. 

- Proor. Elg(P) — f(X)! — E{g(P) — Elg(P)| X]}* = E{Elg(P) | X] — f(X)}’ 
d + 2E[E{g(P) — Elg(P) | X}}{ElgP) | X] — f(X)} | X] > 0. 

n 

a In applications it is convenient to write E[g(P) | X] more explicitly. Suppose 
; that with respect to some measure yz over 9X, each distribution P ¢ & hasa general- 
t 


ized probability density pp , so that for any A, the probability that X « A com- 
puted for P, is given by 


[ pr(x) du(x) . 
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Minimizing a quadratic expression shows that 


[ g(P) p(x) dd(P) 
(4.2) es 


[ pe(x) dd(P) 


is a Bayes solution. 


5. Binomial and hypergeometric distributions. In the present section we shall 
consider three discrete minimax problems. 
PROBLEM 1. (Binomial.) Let X be a binomial random variable witd parameter 


p,09 < p < 1, sothat P(X =k) = (7, ora — p)"“. We shall show that the 


minimax estimate for p is 
(5.1) a a 

nm (Y¥n+1) An +1) 

Consider any linear estimate aX + 8. The risk E,(aX + B — p)’ isa quadratic 





. 1 
function of p which is constantly equal to 8° when a = wall + va and 


1 
B= 1 + Vn) . Hence (5.1) is a constant risk estimate of p. Since it is easily 


seen that 


1 
k n—k a—1 b-1 d 


- pq <p q 
Bi agapesesarent-teonanicisienss 
I pq” : oy dp 


it follows that (5.1) is the Bayes estimate when p is distributed with probability 
density C(pq)¥* +, and hence by Theorem 2.1 we conclude that (5.1) is the 
minimax estimate of p. 

After obtaining this result we were informed that it had been obtained earlier 
by H. Rubin, to whom, therefore, the priority belongs. 

It is interesting to compare the risk of the above estimate with that of the 
standard unbiased estimate X/n. We have 


E-1) 


igvalvats)-?| - ae va 


1 
‘ ; PY — - : 
As is easily seen, _ = ai + var + var if and only if 


_ atk oo ee 
~a+trbo+n’ (qq= 1 P); 


1] 5 V1i+ 2Vn 
p—5|2 LE Vi) 
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Thus the standard estimate is better than the minimax estimate outside an 
interval around p = 4} whose length decreases with increasing n, tending to 0 as 
n tends to infinity. However, for very small values of n the minimax estimate has 
the smaller risk over nearly the whole range. 

PROBLEM 2. (Difference of binomials.) Let X and Y be independent binomial 


random variables, where P(X = k) = (7) (1 — p,)"* and P(Y = 1) = (") 


pa(1 — p.)"~’. By use of theorem 2.1 we shall show that the minimax estimate for 

is _V2n == (2 — r). For the set w of theorem 2.1 we take p: = p, 
1++f2n\n n 

po = 1 — p,O0 < p < 1, and we let Z = X + n — Y. Applying the result of 

Problem 1 to Z, we find the minimax estimate of p to be a2, - Z + Bon , and by 

Theorem 4.2 the minimax estimate based on Z for pi: — po = 2p — 1, is 

V2n | 
1+ V2n 

To prove that this is also the minimax estimate of p: — p» for the original 
problem, we consider the risk as a function of p; and p,. . It is easy to show that 
(1+ V/2n)* R(pi., pz) = 2-[p1(1 — pr) + pa(l — po)] + (pi — pp). Finally it 
can be shown that p,(1 — p;) + po(1 — pe) is maximized, subject to the condition 
that p: — p2 be constant, when p; + po = 1. 

PRoBLEM 3. (Hypergeometric.) We finally consider the problem of estimating 
the number of defectives in a lot from a sample drawn from this lot at random. 
We denote by N and n the number of elements in lot and sample respectively, 
and by D and X the corresponding number of defectives. For later reference we 


note 
P(X =k) = (2) (Ve) 


7 


7 
(* — *) , and the risk of this estimate is constant over w. 


1x) =<nv 
E(X) = na 


2  nD(N — n)(N — D) 


om NN — 1) 


As in Problem 1 we easily find a linear function of X whose risk is constant. 
In fact 


E,j(aX +8 -— DY =6" 


N N an 
SE Oe eeeyeyeyeyE ( = — 1 —_ —— * 
" — /n(N — n) »B 2 ( 7) 
N- 1 


when 
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To prove that aX + £ is the minimax estimate of D we shall show that it is the 
Bayes estimate corresponding to 


. N r 
(5.2) P(D = d) = I (*) pq" -C pq dp, 
0 
where a, b > 0, and 
_ T(a+ b) 


~ F(a) r(b)° 
In this connection it is useful to notice that since (5.2) is a distribution 


> (5) Meee ee tho ret 


d ~~ -V(a) P(b) 


(5.3) T(N +a+b) I'(a) T(b) 


d=0 
Using theorem 4.3, we find the Bayes estimate associated with (5.2) to be 


N—n+k ~ 7 
a @ (‘) C. 5 re ra + d) 1(N +b — d) 
f(k) x _ a=k / n 
/ N—n+k 


2 (*) Coo (") I'(a + d) T(N +b — a) 


Replacing d by (d — a) + a, and using the relation 


d\ (N —d\(N N-1n 
(‘) E - a a = (" _ r) . (terms not involving d), 


we find: 
U(V 7 ")ratat 1) T(N + b — 4d) 
OE) rarearwss-a 
ae +a) T(N +b —d) 


Now apply (5.3) to numerator and denominator separately; then 


a+b+WN a(N — n) 





Pn + 8 at+tb+n’ 
- a@tb+N  aN—n) _ , 
Putting [se sores: 8 one obtains easily 
_ B _ N—-an— 86 
or a-l - 


Substituting the values of a and 6 one finds that 8 > 0, N > an + @ and that 
a > 1 provided N > n + 1. In the special case N = n the result is immediate, 
while if N = n + 1, the result is obtained by giving to D a binomial distribution 
with p = 3. 


oF Dm fs er 
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6. Non parametric problems. We shall in this section consider estimation 
problems in which the functional form of the distribution of X is not assumed 
known. Restrictions will be imposed on the variables only to insure the existence 
of estimates with bounded risk. The problem will be treated under two different 
such restrictions: (i) that the variables are bounded with known bounds, (ii) that 
the variables have bounded variances. 

In the first of these cases we can assume without loss of generality that the 
variables are distributed over the interval [0, 1], and then obtain 

THEOREM 6.1. Let X; , --- , X, be independently distributed over [0, 1] according 
to a joint distribution belonging to a family F. Suppose that F contains the subfamily 
Sy) according to which X,,--- , Xn are independently and identically distributed 
with P(X; = 1) = p, P(X; = 0) = 1 — p, 0 < p < 1. Let E(X;) = us, 
l 
n 


6.1) vq lva +9). 


ProoF. Since (6.1) is the minimax estimate of 7 = p when the distribution of 
the X’s is known to belong to  , we only need to show that its risk is largest for 
the distributions of  . But 


E(AX + B— i)’ = A?oz + [B+ (A — Dal’ = . a o:, +(B+ (A — lal 


and 


>» ui = f. Then the minimax estimate of ji is 
i=1 


Yor, = TE(X%) — Lui S Tui — Lui = ns — Lui — @* — nw’ S nal — w) 


where equality holds for the distributions in fp . 


Coro.uary 6.2. Let X,,---, Xn be a sample from an unknown univariate 
distribution over [0, 1]. Then the minimax estimate of E(X;) = u is given by (6.1). 
Coro.tuary 6.3. Let X,,--- , Xn, be a sample from an unknown absolutely 


continuous univariate distribution over [0, 1]. Then the minimax estimate of E(X;) = 
u ts given by (6.1). 

Corollary 6.3 follows from the fact that any risk function that can be obtained 
for binomial distribution can be approximated by means of absolutely continuous 
distributions. 

Theorem 6.1 can be extended to include variables that are negatively cor- 
related. Namely if X, , --- , X, are distributed over [0, 1] according to a joint 
distribution belonging to some family §f, if for each distribution of & the correla- 
tion coefficient p;; of X; , X; is S 0 for all 7, 7, and if F contains the family S of 
theorem 6.1, then the conclusion of this theorem remains valid. This result can 
be used for example in the following situation. Suppose a sample of n is taken 
from a lot of unknown size, and suppose it is desired to estimate the proportion 
p of defectives in the lot. If / is the number of defectives in the sample, it follows 


1 k 1 
from the above remarks that the minimax estimate ™ p is rz vs + 7 : 


/n' 2 
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It should be pointed out that this result holds only if no upper bound is assumed 
known for the lot size. If it is known that the number of items in the lot is <N,j , 
then the minimax estimate is that found in section 5 for the case of a hypergeo- 
metric distribution with N = No. 

Next let us consider estimating the difference of the average means in two 
groups of variables. 

THEOREM 6.4. Let X, , --- ,Xn3;¥1,--+ , Yn be independently distributed over 
the interval [0, 1] according to a joint distribution belonging to a family &. Suppose 
that F contains the subfamily ‘f, , according to which X,, +++, Xn; Yi, -+-, 
Y, are two samples with P(X; = 1) = p, , P(X; = 0) =1—p,;P(VY; = 1) = 

: . 1 
m,P(Y;=0)=1—-—pm,0Sm,m<il.lf E(X;) = 44, E(Y:) = v:, _—™ 

1 - , , 

i, ao p, then the minimax estimate of i — 7 ts 
4/2; - ™ 
(6.2) ones Ch ~ P). 
1+ V/2n 

Proor. Again, since (6.2) is the minimax estimate in the binomial case (Prob- 

lem 2 of section 5), we need only verify that its risk is a maximum in ‘Ff, . But 


E[A(X — Y) — @ — vf ‘ 
= E[A(X — #) — AVY — 9) + (A- DG — Df 
= A(ox + oy) + (A — 1) G@ — 9), 


of which we already have shown that it is maximized in the binomial case. 

Up to now we assumed the variables to be bounded. Let us now suppose in- 
stead that the variances are bounded. With this assumption we can give an 
analogue of the classical Markoff theorem on least squares. 

THEOREM 6.5. Suppose that X, , --- , X, are independently distributed according 
to a joint distribution belonging to some family ‘f, which contains the subfamily 
fo where the X’s are normal with variance M *, Suppose that for all distributions in 
J, E(X;) = > 5-1 a; 0; and ox; < M’. We assume the matrix (a;;) to be known 
and of rank s S n. Then the estimate [fi(X), --- , fa(X)] of (01, --+ , 0.) which 
minimizes sup E ZZ [f(X) — 0,]’, is the Markoff estimate. 


Proor. Consider first the subfamily ‘f) . Then there exists an orthogonal trans- 
formation to Y;,,---, Yn such that E(Y,;) = /:6, for: = 1, --- , s, where 
k; > 0; E(Y;) = Ofori = s+ 1,---,n; and o, S M fori = 1,---,n. 
Then (Y,, --- , Y.) isasufficient statistic for (#1 , --- , @.), and it is easily shown, 
Y Tak 4 i ; ‘ 
me tee z) is the minimax estimate for 
(6:, °°: , 0-). But this is the Markoff estimate. In order to complete the proof we 
must show that the risk of this estimate takes on in ‘f) its supremum over Ff. But 

2a . . $s y 9 s Y; 9 ° 8 1 
this is immediate; for E >> %-1[f:(X) — @]° = E oi (7 ~ D = M* Di 2" 


“2 


using the methods of [6], that ( 
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In a similar manner it is easily shown that the least squares estimate for a 
linear function of one or more of the @’s, is the minimax estimate. 

Theorem 6.5 gives a justification of the least squares estimate different from 
that of the Markoff theorem. In the Markoff theorem, it is shown that the least 
squares estimate has uniformly smallest risk among all linear unbiased estimates; 
here it is shown that the least squares estimate minimizes the maximum risk 
among all estimates. (The assumptions concerning variances also differ.) 


7. Prediction problems. Frequently one is interested in estimating the value 
of a random variable rather than that of a parameter. A customary method for 
this is to estimate the expectation of the random variable (a parameter) and then 
to “identify” the variable and its expectation; i.e., to use the estimate of the 
expectation as a prediction for the variable. As we shall see below one is led to 
this procedure if one adopts the point of view of unbiased estimation, so that 
from this point of view prediction poses no new problem. This however is no 
longer true when one employs the minimax principle. 

Consider a pair X, Y of random variables having a joint distribution P 
belonging to a family & of distributions. It is desired to use the observed X to 
predict, say, g(Y). We are interested in minimax predictions; i.e., functions 
f(X) which minimize supp.s EpW[g(Y), f(X)]. To obtain minimax predictions 
we need the following analogue of Theorem 2.1. 

THEOREM 7.1. Let {P,9}, 0 € w be a parametric subfamily of F, and let X be a 


probability measure over w. Suppose that f is such that [B.wia), f(X)] dv) ts 


minimum, and that 


(i) EeaWig(Y), f(X)] ts constant, say = c, for all @ € w, 

(ii) EpW[g(Y), f(X)] S cfor all P ¢ F. 
Then f is a minimax prediction for g(Y). 

The proof is the exact analogue of that of theorem 2.1. 

CoROLLARY 7.2. A constant risk Bayes prediction is a minimax prediction. 

Suppose now that X and Y are independent and that Wlg(y), f(x)] = 
(9(y) — f(x). Consider the problem first from the point of view of unbiasedness. 
A prediction could reasonably be called unbiased if Epf(X) = Epg(Y). Subject to 
unbiasedness, the risk is given by Eplg(Y) — f(X) = opf(X) + o> g(Y). 
But opg(Y) is a known function of P, and hence the problem of minimizing 
(for a particular P) the expected squared error reduces to that of finding an 
unbiased estimate of Epg(Y) with minimum variance at P. In a similar way one 
sees, without any restriction to unbiased predictions, that the Bayes prediction 
for g(Y) is the same as the Bayes estimate for Epg(Y), and hence that formula 
(4.2), with g(P) replaced by Epg(¥), may be used if the assumptions there made 
are valid. 

One might expect that as in the unbiased theory the prediction will coincide 
with the estimate. This however is not the case since the \’s that give constant 
risk in the two cases will usually be distinct. In fact the two problems are rather 
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different in that the “least favorable” » for the prediction problem must not 
only take into account the difficulty of finding the correct value of 6 for various 
a priori distributions but also the difficulty of predicting g(Y) when @ is known. 

As a first example consider the prediction analogue of problem 1 of section 5. 
Let X, Y be independent binomial variables such that P(X = k) = 


(”) p'(1— p)”* and P(Y =1) = (") p(1 — p)”*. Weshall obtain the minimax 


prediction of Y in a manner quite analogous to the one in which we determined 
the minimax estimate of p. Actually, the present problem is a generalization of 
the earlier one, to which it can be reduced by letting n — ~. First it is easily seen 


that 
E (2x +e ) 
m n 


is a quadratic function of p, which when m > 1 is constant for 


m 1 1 1 
a Fe : oe 
™ al mn + |: 

l—a 


p= — 


~_ 





r 


X ‘ : , 
But we have already seen that a = + 8 is the Bayes solution corresponding to 


a—1_b—1 m a . l—a 
h = —___. = —_.. Clearly 8 = —-— wl 
Cp q wherea ae 5 5s early 8 3 When 
a = b, and a > 0 provided 0 < a < 1, which is easily verified when m,n > 1. 
We note that as n > ~, the values of a, 8 tend to those of the minimax estimate 
of P. 
x PY ' .— 1- 
When m = 1,E (< +p- ) is constant for a = : eh b= — ae 


2n -* 
and again a = + 8 is the Bayes estimate of a beta distribution when n > 1, and 


hence minimax. 

Finally in the case n = 1, the situation degenerates. Since E(} — Y)° = 4, the 
prediction f(X) = 3 has constant risk. In addition it is the Bayes prediction 
corresponding to the distribution which assigns probability 1 to p = 3. Hence 
in this case, regardless of the value of X one would predict for Y the value 3. 

It is interesting that the above prediction problem can be interpreted also as 
an estimation problem in the following manner. Suppose a lot of size N = m+n 
is such that the number of defectives follow a binomial distribution; this is the 
case when the items making up the lot are produced by a manufacturing process 
that is in statistical control. It is desired to estimate from a sample of size m 
taken from this lot, the proportion of defectives in the remainder. That this is 
equivalent to the prediction problem treated above follows from a remark of 
Mood [13] that in such a lot the number of defectives in the sample and in the 
remainder are independently distributed according binomial distributions with 
common p. 


to 


en 


ite 
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We can again use the binomial results to obtain the solutions of certain non- 
parametric problems. For example, let Xi, --- , Xm be independently and 
identically distributed on [0, 1] and let Y; , --- , Y, be another sample from the 
same distribution. Then the minimax prediction for Y is given by aX + B with 


m oe 1 l—a . 
om E /iti-X |e- —5— - This follows from the fact 
that 


E(aX + 6 — Y) = Ela(X — a) — (Y —n) + 4+ (ae — 1)p)f 
“ a (2 + f+0+e~- oe 


1 
m mn 


< a’ (: + ‘) wl — w) + 1B +(@ — Daf. 
m n 
An analogous modification clearly is possible for theorem 6.4. 
For the situation considered in 6.5, the prediction problem gives the same 
result as the estimation problem. For consider first two samples X,, --- , 
Xn ;Y1,--:, Y, from a normal distribution with known variance o”. Here 


2 
Baf(%i, «+, Xa) — PP = BK, ++, Xe) — of +E, 


and hence the risk differs from that of the estimation problem only by a con- 
stant. Thus X is the minimax prediction of Y, and it is then seen immediately 
that it is also the minimax prediction for Y when of the underlying common 
distribution of the X’s and Y’s it is assumed only that the variance is bounded. 
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THE THEORY OF PROBABILITY DISTRIBUTIONS OF POINTS 
ON A LATTICE’ 


By P. V. Krisuna Iver 
University of Oxford 


1. Introduction and summary. This paper discusses the theory of certain 
probability distributions arising from points arranged in the form of lattices 
in two, three and higher dimensions. The points are of k characters which for 
convenience are described as colors. A two-dimensional lattice will consist of 
m X n points in m columns and n rows. In a three-dimensional lattice there 
will be 1 X m X n points in the form of a rectangular parallelopiped. Two 
situations arise for consideration. They are, to use the term of Mahalanobis, 
free and non-free sampling. In free sampling the color of each point is determined, 
on null hypothesis, independently of the color of the other points. The proba- 
bilities of the points belonging to the different colors, say black, white, ete. 


are Pi, P2-:: px, such that >op, = 1. In non-free sampling the number of 
1 


k 
points of each color is specified in advance, say n; , n2 --- nz so that >on, = mn 
1 


or /mn according as the lattice is two- or three-dimensional. Only the arrange- 
ments of these points in the lattice are varied. 

The distributions considered in this paper are the following:— 

(i) the number of joins between adjacent points of the same color, say 
black-black joins, 

(ii) the number of joins between adjacent points of two specified colors, say 
black-white joins, and 

(iii) the total number of joins between points of different colors, along mu- 

tually perpendicular axes. 

The methods used here are the same as those developed by the author [3] 
for the linear case. All the distributions tend to the normal form when /, m and n 
tend to infinity, provided the p’s are not very small. 

Before considering the various distributions, we shall have a brief review of 
the work done on this topic by other people. For free sampling, Moran [5] and 
[6] has discussed the distribution of black-white and black-black joins for an 
m X_ n lattice of points of two colors. For a three-dimensional lattice, he has 
given the first and the second moments for the distribution of black-white 
joins. Levene [4] has announced some results closely allied to those of Moran 
for a square of side N (with N’ cells) each cell taking the characteristic A or B 
with probabilities p and gq = 1 — p respectively. Bose [2] has found the expec- 
tation of 

x = the number of black patches — the number of embedded white patches, 


1 Part of a thesis approved for the degree of Doctor of Philosophy, Oxford University. 
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for a square divided into n? small cells, having p and g = 1— pas the probability 
of the cells being black or white. An embedded white patch is one that lies 
completely inside a black patch. 

The above review shows that the work done so far is confined entirely to the 
free sampling distributions, the points taking only two characters. As mentioned 
in the beginning of this article, we shall deal here with the free and non-free 
sampling distributions for points possessing / characters or colors. 


2. Two dimensional lattice. Let an m X n rectangular lattice consist of mn 
points of k colors with probabilities p: , po, --- px , such that Zp, = 1. (When 
there are only two colors, p; and pe are taken as p and q respectively.) All the 
problems dealt with for the linear lattice (Krishna Iver, [3]) can be investigated 
here also. But the most important of them is the distribution for the total number 
of joins between points of different colors. This takes into consideration the 
relative position of points of all colors in the lattice. Distributions for the number 
of black-black or black-white joins are not based on the arrangement of all the 
points in the lattice and therefore cannot be considered to be adequate for testing 
the random distribution of the points in the lattice. Therefore the distribution 
of the total number of joins between points of different colors has been dealt 
with in some detail. As the actual distributions are very complicated they 
are discussed by means of cumulants. The first and the second moments for 
the other distributions have also been given. 

2.1. First and second moments for the distribution of black-black joins for two 
or more colors. The first and the second moments for free sampling have been 
obtained by Moran [5] and [6]. In order to give an idea of the methods used 
in this paper for obtaining the moments and also to facilitate the derivation of 
the corresponding moments for non-free sampling, they have been obtained 
again for both black-black and black-white joins. 

(a) Free Sampling. In the course of similar investigations on the distribution 
of black-black joins arising from points on a line, the author [3] has found that 
the rth factorial moment is r! times the sum of expectations of the different 


ways of obtaining r joins. This finding is true for the rectangular lattice also. 
This may be established as follows. 


Define variates u;;, (¢ = 1,2, ---,;j7’ = 1,2,---,m — 1) to be one if 
the (7, 7) and (7, 7 + 1) positions are black and zero otherwise; then E (u;;-) = p?’, 
and the higher factorial moments are zero. Similarly, define v;,; (?’ = 1,2, --- , 
n—1;j = 1,2, --- , m) to be one when the (7, 7) and (7 + 1, 7) positions are 
black and zero otherwise; then E(v;-;) = p’, and the higher factorial moments 
are zero. Further, u;;- is independently distributed of all w’s and v’s except 
Uijr—ay Ui,jreay Vi-nje » Viei,j , Vi-n,j-41, Vier.j-42, and v;-; is independently 
distributed of all w’s and v’s excepting two vertically adjacent v’s and four hor- 
izontally adjacent w’s. If 


8 = Lui + 20:5, 
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E(s) = Dip + Lip’ 
es? a's7 


= (2mn — m— n) p 


and E ar = 2E (the number of waysof selecting any two of the ones included in 
y > b 
Dus je oS Dv, ;) 


= 2E (uw + w + vv) 


involves only the cross products since E(u) = 0 = E(v’). For products of 
dependent pairs the expectation is p’, while for independent pairs it is p’. Hence 
one merely needs to count the number of dependent and independent products. 
Similarly for the third factorial moment one needs consider only products of 
three first powers of the variates (with expectation p’), those with two dependent 
and one independent variates (with expectation p*), and those with three de- 
pendent variates (with expectation p’). 

Thus the second factorial moment can be obtained by counting the number 
of ways of obtaining two black-black joins from (i) three adjacent points and 
(ii) two pairs of adjacent points. They are explained below diagrammatically 
for a 5 x 4 lattice. 


(1) 


- X—X—X 


‘X’ denotes a black point. 
‘.? denotes any point other than black. The expectations for items (1), (2) 
and (3) indicated above are 


[(m — 2)n + (n — 2)mJp’, 
(2.1.1) 4(m — 1) (n — 1)p’, 


22 2 2 ‘ ‘ 
1[4m'n™ — 4mn(m +n) +m + n° — 12mn 4+ 13(m+n) — 8] p, 





/ 


) 
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respectively. Thus 


(2.1.2) M12) = 2[6mn i 6(m + n) + 4]p° as 
= + [4m’n? — 4mn(m + n) + m’n® — 12mn + 13(m + n) — 8]p*. 


It can now be seen that 
(2.1.3) mw: = (2mn — m — n)p’, 
(2.1.4) 240 = (2mn — m — n)p’ + 2(6mn —6m — 6n + 4)p’ . 
— (14mn — 138m — 13n + 8)p’. 
Putting m + n = a, and mn = b, the above expressions reduce to 
(2.1.5) wi = (2b — a)p’, 


(2.1.6) we = (2b — a)p’ + 2(6b — 6a + 4)p’ — (14b — 13a + 8)p*. 

These substitutions have been continued throughout this Section. 

(b) Non-free sampling. The chances of obtaining r black points in free and 
non-free sampling are p’ and n{”/b” respectively. Therefore it is obvious that 
the rth factorial moment about zero for non-free sampling distribution of black- 
black joins can be reduced by substituting n{”/b™ for p’ in ur, for free sampling. 
This substitution gives 


(2b — a) n& 





(2.1.7) Hita.ns) = —— 2 
_ (20%— a)nf? , 2(6b — 6a + 4)n{” 
M2(ni,n9) = pe) + a 
(2.18) _ {(14b — 13a + 8) — (2b — a)*}n{® 


b@ 
{(2b — a)n€\? 
™ \ b@ ; 


where pUr(n;,n.) represents the rth moment with mn; black and n2 white points 
on the lattice. 

2.2. Cumulants for the distribution of black-white joins for two colors. For m 
points on a line, the author [3] has shown that the first four cumulants of the 
free and non-free sampling distribution of black-white joins can be obtained 
from the non-free distributions for (1, m — 1), (2, m — 2), (3, m — 83) 
and (4, m — 4) black and white points distributed at random. This method is ap- 
plicable for two and three dimensional lattices also. This can be established from 
the following considerations. 

(i) The rth moment about zero for the free sampling distribution is 


b 


? This result differs slightly from that given by Moran. The correct result is the one 
given here. 
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where b = mn and <Y2’f, is the rth moment for the non-free distribution with 
s black and (6 — s) white points. 

(ii) Daf, is the same for the two distributions arising from (1) s black and 
(b — s) white points and (2) (b — s) black and s white points. 

(iii) The rth moment is a polynomial in pq of degree r. This can be seen from 
the fact that the factorial moment is the sum of the expectations of the different 
ways of obtaining r black-white joins. The probability of r independent black- 
white joins is (2pq)' and this is the highest power of pq. 

In view of the above conditions, (i) reduces to 


(2.2.1) Arpq(p +)” + Asp'g'(p + q)”* +++ + Arp’ (p + 9)” 
i = Aipqg + Anpg? +> + Anp'”d, 
where A;,, As, etc. are determined from the following relations:— 


( Srab-1) — 


S-,b-2) = 


(2.2.2) | . i ide. 
| S;(3,b-3) = oat “ + ( 9 ) A . 
| _ 


ee , b — 6 ’ b — 6 , b — 6 ’ 
Sr (4,b—4) = A 4r + ( 1 ) As + ( 9 ‘ A 2r + ( 3 *) Air, 


where S, 2-1 is the rth moment about zero for the non-free distribution with 
t black and (b — t) white points. This is obvious by comparing the coefficients 
of p‘q’‘ in (i) with (2.2.1). 

Therefore the first four cumulants can be calculated by finding the frequency 
distributions of black-white joins for (1, b — 1), (2, b — 2), (8, b — 3) and 
(4, b — 4) black and white points. These distributions were determined by a 
systematic examination of the number of black-white joins in all the possible 
arrangements for the given number of black and white points. The moments 
of these distributions enable us to determine the A’s. 


An = 2(2b — a), 

Ai. = 2(8b — 7a + 4), 

Ais = 2(32b — 37a + 36), 

Ay = 2(128b — 175a + 220), 

Ay = 4(a’ — 4ab + 4b’ + 13a — 14b — 8), 

Ags = 4(21a” — 66ab + 48d" + 210a — 156b — 228), 

Az = 8(—a’ + 6a’b — 12ab’ + 8b° — 39a + 120ab — 84b” — 272a + 184b 
+ 312), 

Ay = 4(295a° — 760ab + 448b" + 2305a — 1304b — 3428), 













os eS 


le 
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Au = 8(—42a° + 216a°> — 360ab? + 192b° — 14100” + 3612ab — 20166" 
— 7884a + 3648b + 12720), 
Au = 16(a* — 8a°b + 24a°b” — 32ab° + 16b* + 78a° — 39607 + 648ab* — 336b° 
+ 1643a° — 4196ab + 2252b° + 7926a — 3084b — 13464), 
where a = m+n, and b = mn. 
The above values of A’’s give the first four moments for free sampling about 
zero. The cumulants reduce to the following expressions: 














(2.2.3) K, = 2(2b — a)pgq, 

(2.2.4) Kk. = 2(8b — 7a + 4)pq — 4(14b — 13a + 8)p’q’, 

(2.2.5) Ks = 2(32b — 370 + 36)pq — 8(90b — 1lla + 114)p’q° 
+ 64(29) — 37a + 39)p’q’, 

(2.2.6) Kk, = 2(128b — 175a + 220)pq — 4(1784b — 2617a + 3476)p’¢° 
+ 32(1548b — 2361a + 3228)p*q° 

— 32(3126b — 4899a + 6828)p‘q". 





As indicated for black-black joins, the first and the second moments for 
non-free sampling can be calculated by substituting 





) 
pa = nins?/o°™ 



















in the uncorrected moments about the origin for free sampling. This is true for 
all the distributions considered in this paper. 

Before proceeding to discuss the limiting form of the distribution, it may be 
noted that the first four cumulants for the free-sampling distribution of black- 
white joins are linear expressions in a and b. This result is similar to what has 
been established for the linear lattice (Krishna Iyer, [3]). When the points 
lie on a line, all the cumulants of the distribution of the number of joins (black- 
black or black-white) are linear in m (the number of points on the line). This 
suggests that the higher order cumulants for the distribution of joins in a rec- 
tangular lattice also will be linear in a and J, i.e. the rth cumulant will be of the 
form 


x (Lrsb + M,sa + Nys)p’q’ , 


where L, M and N are independent of a and b. It has not been possible to obtain 
a formal proof for this statement. 

The limiting form of the distribution of the number of black-white joins is 
now examined on the basis of the cumulants given above. Since kz , k; and Ks are 
linear in a and b, y; and yz tend to the limit zero as m and n tend to infinity. 
That the higher order y’s also tend to the limit zero can be seen from the fact 
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that all the cumulants will be linear functions in a and b. Hence the distribution 
of 







as x — 2(2b — a)pq a 
4 »/2(8b — 7a + 4)pq — 4(14b — 13a + 8) pq? 
tends to the normal form as m and n tend to infinity, where z is the observed 
number of black-white joins in a given arrangement of the points. 
When p = q = 3, the first, second and third cumulants are equal to those 
obtained for a binomial distribution whose ‘n’ is (2b — a). 
As in the case of linear lattices, the distribution of the number of black- 
white joins in an m X n rectangular lattice for non-free sampling also will tend 
to the normal form as m and n tend to infinity. 




























TABLE 1 
Distribution of the number of black-white joins for 2 X 8 lattice 











No. of B-W No. of black points | Total 
joins 0 1 2 3 4 5 6 | ota 






o 








Noah © bh = 








| | 





| 


= 7/2, “a => 7/4, 





x3 = 0, _ 













In order to have an idea of the nature of the distribution of the number of 
black-white joins when p = q or otherwise, the complete distributions for the 
lattices 2 K 3,2 K 4,3 X 3 and 3 X 4 are given in Tables 1, 2, 3, and 4. 

The distributions tabulated in Tables 1, 2, 3 and 4 show that the probability 
of getting 1 and (2b — a — 1) black-white joins is zero, while for 0 and 
(2b — a) joins it is not so. But this abnormality will not affect the limiting 
form of the distribution when m and n tend to infinity because the probabil- 
ity for 0 and (2b — a) black-white joins also tends to zero. 

2.3. First and second moments for the distribution of black-white joins for k 
colors. Free sampling. Taking p,; and py, as the probabilities that a point in the 
lattice is black or’ white, the expected number of black-white joins is 


(2.3.1) 













2(2b — a) pipe . 














PROBABILITY DISTRIBUTIONS 


TABLE 2 
Distribution of the number of black-white joins for 2 X 4 lattice 


No. of No. of black points 


joins 2 3 4 


o 
| 





CWO ONOAORrR WN 


— 





aS = 5, = 5/2, = 0, 


TABLE 3 
Distribution of the number of black-white joins for 3 X 3 lattice 


No. of black points 
3 


oCoOnN Oo U rR WN K& © 
| SBaw!| 
| Sbow| 





P. V. KRISHNA IYER 


TABLE 4 
Distribution of the number of black-white joins for 4 X 3 lattice 


No. of black points 





i) 


mwhd = 


5 
6 
7 
8 








4.25, 


TABLE 5 
Frequency distribution of the total number of joins between points 
colors for 1 black, 1 white and (mn — 2) red points 





No. of joins Frequency 





28 

4(5a — 26) 

2(2a* — 25a + 4b + 56) 

2(—4a? + 2ab + 17a — 6b — 12) 
4a? — 4ab + b? — 4a + 3b — 12) 





As in the case of black-black joins, the second factorial moment about zero 
is twice the sum of the expectations of the different ways of forming two black- 
white joins and can be determined by the method described in section 2.1. 
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ui = 2(6b — Ga + 4)prpe(pi + po) 
(2.3.2) 9 9 2.2 
+ 4(a” — 4ab + 4b° + 13a — 14b — 8)pips. 
From this, u2 works out to be 
Be = 2(2b — a)piyp2. + 2(6b — 6a + 4)pipe(pi + pe) 
— 4(14b — 13a + 8)pipe. 


2.4. First and second moments for the distribution of the total number of joins 
between points of different colors for three colors. The expectation for free sampling 
is 


(2.4.1) 


(2.3.3) 






wi = 2(2b — a)=prDs . 


The coefficients of pg and p’q° in the second moment are the same as those for 

two colors. The coefficient of pypeps can be obtained from the frequency distribu- 

tion of the total number of joins between points of different colors when there 

are 1 black, 1 white and (mn — 2) red points in the lattice. See Table 5. 
Defining Seai0-2 = Saf, for the above distribution, 


Soa,1,0-2) = 2(4a” — 30ab + 32b° + 55a — 54b — 32). 


As in the case of two colors, the second moment about zero for three colors 
reduces to the form 


An(pi + Po + D3) * Lprps + Anl(pi + po + p;)° Pipep3 + 
Ax(p + pit ps)“ Uprps = An=p,ps + Ansprpsps + Ase Uprp , 


since pi + po + ps = l. 

The coefficient of p? “psp; on the left hand side of the above equation is equal to 
Soars) » Le. Soar = sum of coefficients of pi “psp; in An(pi + po + 
ps)? Sp-p. and Ayo(p1 + po + ps)” “prpsps. Therefore the coefficient of pypyp; in 
uris Soa.1.b-2) — coefficient of pr” “pop; in 2(8b — 7a + 4)(p: + po + ps)” "Epps — 
coefficient of prpsp3 in 


4(2b — a)’ (Zp,p:)” = Soa,1,s-2) — 2(8b — 7a + 4) (2b — 3) — 8(2b — a)’ 
= 4(17a — 19b — 10). 











It can now be seen that 
wu. = 2(8b — 7a + 4)Ep,p, — 4(14b — 13a + 8)Eprp; 
— 4(19b — 17a + 10)pipops . 


2.5. First and second moments for the distribution of the total number of joins 
between points of different colors for k colors. As in the previous cases, the expecta- 
tion for free sampling is 


2(2b — a)=prp, . 
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The coefficients of Zp,p,, =p-psp: and Tp;p; in the second moment are the 
same as those for three colors. The coefficient of =p,p.p:pu is determined by finding 
the distribution of joins between points of different colors when there are 1 
black, 1 white, 1 red and mn — 3 green points in the lattice. See Table 6. 


Soq.1,1,mn—3) 
= 2(12a’b — 69ab> + 72b° — 36a” + 330ab — 34267 — 408a + 348b + 240), 


The coefficient of 2p,p.p:p. iN uw. can be obtained on the same lines as explained 
(mn—3) 


for three colors and is equal to Seq.131,mn—3s) — coefficient of p; 
the homogeneous expression of degree mn in ys for three colors + 8(2b — a)’ 


= 8(14b — 13a + 8). 


TABLE 6 


Frequency distribution of the total number of joins between points of different 
colors when there are 1 black, 1 white, 1 red and (mn — 3) green points 








No. of | 


joins | Frequency 


| 240 

| 12(19a — 112) 

| 12(6a? — 78a + 7b + 208) 

| 4(2a* — 57a? + 15ab + 310a — 66b — 444) 

| 6(—4a? + 2a%> + 36a? — 2lab + 2b? — 86a + 36b + 72) 

| 6(4a3 — 4a*b + ab? — 6a? + 8ab — 2b? — 10a — 40) 

(—8a* + 12a*b — 6ab* + b* — 24a? + 18ab — 3b? + 44a — 34b + 
192) 








It follows now that 
bo = 2(8b — 7a + 4)Zp,p, — 4(19b — 17a + 10)Zp,p.p; 


(2.5.2) — 
— 4(14b — 13a + 8)Zp-p, + 8(14b — 13a + 8)Zp,pspipu . 


In general the cumulants’ for free sampling involve b and a in the first degree 
only, and therefore, when m and n are large, the distribution tends to the normal 
form. If x is the observed total number of joins between points of different 
colors, the distribution of 


x — 2(2b — a)=p,p, 
Vb 


3 The author has recently obtained the third and fourth cumulants for this distribution. 
They are linear functions of the dimensions of the lattice. The results will be published in 
an early issue of the Ind. J. Agric. Stat. 
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tends to the normal form with 


16p,p. — 762 ppp: — 56Zpsp; + 112Up,pspype , 


as its variance for large values of m and n. 
For non-free sampling also, the distribution of 


x — 22mn — m — n)Le,ze, 
mn ; 
where ¢, = n,/mn, approaches the normal form having 
4 de,e0: + 82e,e, — 162e,€.€€y 


as its variance. The error of this variance will be about 5% or less when m 
and n are greater than 35. 

















3. Three- and higher-dimensional lattices. This section deals with the first 
and the second moments for the distribution of black-black, black-white and 
the total number of joins between points of different colors for three- and higher- 
dimensional lattices. Besides these, the third and the fourth cumulants for the 
distribution of black-white joins in a three-dimensional lattice with points 
of two colors are also given. 

3.1. First and second moments for the distribution of black-black joins. Free 
sampling. Let E;(1) be the expectation of the number of black-black joins 
for a lattice of sides 1, m and n. Further let A, and A; be the number of ways 
of obtaining a black-black join in m X n andl X m X n lattices. Then 


E3(1) = Aspi, 
A; Aol + mnil — 1), 


and 


I 


A» (2mn — m — n). 








Therefore 
(3.1.1) E;(1) = (3lmn — lm — mn — nl) pj. 


For the sake of convenience all the results for the three-dimensional lattice 
are expressed after making the following substitutions: 


l+m+n, 
d = Im+ mn + nil, 


c 











e = lmn. 


E3(1) in terms of c, d and e is 





(3e — d)pi. 
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The expectation of the number of black-black joins for a lattice of r dimensions 
(i X i X --- 1) is given by 


(3.1.2) E,(1) = (rll, --- 1, — She --+ la) pi, 


where Lil, +--+ Uy is the sum of the product of the sides taken (r — 1) at a 
time. 

It has been pointed out before that the second factorial moment is twice 
the sum of the expectations of the different ways of forming two black-black 
joins. Using this fact, if 2B. , 2B; , etc. are the coefficients of p° in the second 
factorial moment for two-, three- and higher-dimensional lattices, it will be 
found by direct enumeration made in succession from lattices of lower dimen- 
sions that 


B, = Bo-yl- + 4A (1) (1, coe 1) + Lil ial L(r—1) (1, — 2). 


This can be established from the following considerations. 1) Two black-black 
joins can be obtained from three black points situated close to one another 
and the chance of having three black points in a specified manner is p’. 2) The 
number of ways of getting two black-black joins from three points in the lattice 


1S 


Bo-pl, + 4A ¢_y(l, ai 1) + Ll rae lr—1) (l, — 2). 


‘ Zs ‘ . . 
C,, the coefficient of pin the corrected second moment, is given by the equation 


C, = —(2B, + A,). 


° . : C 3 4; 
This follows from the fact that the sum of the coefficients of p° and p’ in the 
uncorrected factorial moment, about zero, is twice the number of ways of select- 


ing two joins from the total number of joins in the lattice which is (A, — 1). 
Thus 


(3.1.3) A,pi + 2B.pi + C.pi 

is the corrected second moment for the distribution of black-black joins in a 
lattice of r dimensions. For an 1 X m X n lattice 

(3.1.4) po = (3e — d) pi + 2(15e — 10d + 4c) p} — (33¢ — 21d + 8c) pi. 


3.2. Cumulants for the distribution of black-white joins for two colors. The 
first four cumulants for free and non-free sampling distributions in anl K m X n 
lattice can be determined from the frequency distributions of black-white joins 
for (1, lmn — 1), (2, lmn — 2), (3, lmn — 3) and (4, lmn — 4) black and white 
points by the method described for linear rectangular lattices. If 


ue, = At,pg + Aepg +--+» + Arp”, 


> . . . . e . » i | 3 3 ¢ 

the first three distributions give the coefficients of pq, pq and p‘q’ in the first 
three moments about zero. The three cumulants calculated from these moments 
are given below in terms of c, d, and e for free sampling. 
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= 2(3e — d)pq, 
(3.2.2) Ky = 2(18e — 1ld + 4c)pq — 4(33e — 21d + 8c)pq’, 
(3.2.3) kz = 2(108e — 91d + 60e — 24)pq 
+ 8(327e — 288d + 198c — 84)p’¢° 
+ 32(219e — 197d + 138¢ — 60)p‘q°. 


The calculation of the fourth cumulant by the direct method of finding the 
frequency distribution of the number of black-white joins for 4 black and (lmn— 4) 
white points was found to be very laborious and therefore this has been cal- 
culated by a special method. The coefficients of pq, p’q’ and p’q* have been deter- 
mined, as in other cases, by finding S2‘f, for the first three distributions. These 
coefficients reduce to a linear form in c, d and e. Now the fourth cumulant, being 
a linear function of these quantities, the coefficient of p’g’ involves c, d and e 
in the first degree only and therefore this can be assumed to be of the form 

ae + Bd + ye + 4, 
where a, 8, y and 6 are constants. No simple proof can be given here regarding 
the linear assumption of the cumulants. It may be observed that this is true of 
the first four cumulants for linear and rectangular lattices. The author [3] has 
already provided a general proof of this assumption for the linear lattice and he 
hopes to extend this for the higher dimensional lattices in the near future.* 

The constants a, 8, y, and 6 can be determined by finding «x, for p = q = 
from the frequency distributions of black-white joins for 2 K 2 X 2, an 
2X 3 X 3 lattices for two colors as given in Tables 7 and 8. 

When p = q = 3, ks reduces to the form a’e + b’d + c’c + d’, where a’, b’, 
c’ and d’ are constants. In view of this relation, if m and n are fixed, and I takes 
values 1, 2, 3, ete., the values of x, for the different lattices should be in arithmetic 
progression. This can be seen by comparing the values of x4 for the lattices 
1X 2X 2,2 KX 2 X 2 and 3 X 2 X 2 which are 1, 7.5 and 14, respectively. 
Using this property, it is possible to find x, for a lattice of any size from the com- 
plete distribution of the lattices 1 X 2 X 2,1 KX 2 X 3,1 X 3 X3,and 2 KX 2 X2 
given before. Thus x, for 2 KX 2 X 2,2 X¥ 2X 3,3 X 3X 2and3 X3X3 
lattices are 7.5, 14, 25.875 and 47.25 respectively. Now a, 8, y and 6 can be ob- 
tained by equating the general expression for the fourth cumulant to the values 
given above for the corresponding values of 1, m and n and putting p = g = 3. 
The equations giving the values of a, 8, y and 6 are 


( 80 + 126 + 60; + 0, =7 
| 126, + 166. + 70; + 64 
(3.2.4) é 
t 276; a 270 = 96; _ O4 = 


ll 


ou tl~ 


\| 
— . 
or 


| 
bo 

; _ 

io oe S 
or 











4 This proof has been obtained recently and will be published soon. 
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_ 32 X 19176 +a _ 
256 , : 
_ 32 X 20952 + 7 ~32 X 16128 + 6 
256 = + 256 
They give 


—32 X 21638 + 6 
256 , 


—32 X 19148, 6 = 32 X 21615, 
—32 X 20940, and 6 = 32 X 16128. 


TABLE 7 
Frequency distribution of black-white joins, 2 X 2 X 2 lattice for two colors 


No. of black points 


Total 
3 4 5 


Thus the general formula for the fourth cumulant is 
2(648e — 671d + 604c — 432)pq 
—4(9996e — 10857d + 10196c — 7632)p°q° 
4+32(9144e — 10167d + 9732c — 7416)p°q° 


—32(19143e — 21615d + 20940c — 16128)p*q’*. 
For a lattice of sides l,, l2,--- 1, in r dimensions, the first two moments 
for the distribution of black-white joins for free sampling are as follows: 


(3.2.6) 
(3.2.7) 


ui = 2A,pq, 


uw, = 2(A, + B,)pg + 4C,p' 








nts 
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Like the distributions for linear and rectangular lattices, when 1, m and n 
tend to infinity, y; and y2 will tend to zero and therefore the distribution of 
black-white joins for anl X m X n lattice also tends to the normal form. The 
remarks made in connection with the distribution of black-white joins for a 
rectangular lattice are true here also. Here the frequencies for 1, 2, [(3e — d) 
— 2] and [(3e — d) — 1] black-white joins are zero, while for 0 and (3e — d) 








TABLE 8 
Frequency distribution of black-white joins for 2 X 3 X 8 lattice for two colors 
No. of | T . a 
black- | No. of black points —_ 
white Pi et i a i 


joins 





CONOR WN FE © 














 - 10, = 5, = 0, — 14 


they are two. But this irregularity will not affect the limiting form of the dis- 
tribution since the relative frequencies tend to zero. 

3.3. First and second moments for the distribution of black-white joins for k 
colors in an r-dimensional lattice. The results for free sampling follow easily from 
a consideration of the expectations of the various ways of obtaining one and 
two black-white joins. The expectation of the number of black-white joins is 


(3.3.1 ) 2 A;pipr ° 
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The expectation for two black-white joins is 


AA, — 1 2 2 
B, pipo(pi + pe) + 44 in ) = B, pi Po. 


9 


_ 


From this it will follow that the second moment 
(3.3.2) 2 = 2A, pi peo + 2B, pi po(pr + pe) + 4C, pips. 


3.4. First and second moments for the distribution of the total number of joins 
between points of different colors for an l X m Xn lattice for three colors. The ex- 
pectation for free sampling is 


(3.4.1) 2(3e — d)=p,p, . 


TABLE 9 
Distribution of joins between points of different colors for 1 black, 1 white and 
(lmn — 2) red points 


| Frequency for lattices 
No, of joins - — —— 
2x2 2X2X3S j2KSEXS SXSXS 
| 16 8 ~ 
56 80 104 
56 144 
+ 96 276 
— 18 112 
—_ ~— 66 


Total 56 132 306 702 














~x*f, about zero 1752 5416 15778 44136 


The second moment will involve terms in =p,p,, pypsps and Up;p;. The co- 
efficients of =p,p, and =p;p; are the same as those for two colors. The coefficient 
of pipep3 can be determined by finding the frequency distribution of joins be- 
tween points of different colors when the lattice consists of 1 black, 1 white and 
(Imn — 2) red points. But this straightforward method is cumbersome and 
hence the coefficient of p,p2p3 has been determined by finding the distribution for 
the special lattices 2 K 2 KX 2,2 X 2 X 3,2 X 3 X 3, and 3 X 3 X 3. These 
results are shown in Table 9. 

The coefficients of p:psp3 in the corrected second moment for the above lattices 
are obtained by subtracting 2(18e — 11d + 4e) (2e — 3) + 8(e — d)’ from the 
moments noted above. This can be seen to be so by comparing the above ex- 
pression with the quantity subtracted from the uncorrected second moment for 
a two dimensional lattice in section 2.4. The coefficients so obtained for 2 K 2 X 2, 
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2,.X 2X 3,2 X 3 X 3, and 3 X 3 X 3 lattices are —336, —640, —1184 and 
— 2142 respectively. Now the coefficient of p:pop;3 in the corrected second moment 
is of the form 


awe+ Bd+ycet+ 8s. 


The equations obtained by equating this expression to —336, —640, —1184 
and —2142 for the respective lattices give a’ = —174, Bp’ = 108, 7’ = —40 


TABLE 10 


Distribution of joins between points of different colors when there are 1 black, 
1 white, 1 red and (lmn-3) green points 


Frequency for lattices 
No. of joins 
2X2xX3 2X3X3 3X3X3 


288 72 
912 1344 
1344 2664 
1560 4392 
720 4584 
72 3168 
-- 1206 
— 120 





336 1320 4896 17550 


2x7, about zero... 20160 110208 531312 2370168 





and 6’ = 0. Thus the second moment for a lattice with points in three colors is 
2(18e — 1ld + 4c)=p,p, 
(3.4.2) —2(87e — 54d + 20c)pipops 
—4(38e — 21d + 8c)=p?p?. 


3.5. First and second moments for the distribution of the total number of joins 
between points of different colors in anl X m X n lattice for four or more colors. 
The expectations for free sampling are given by the same expression as for three 
colors. The coefficients of Sp,p., =p,pspe and Zp;p in the corrected second 
moment are also the same as in section 3.4. The coefficient of =p,psprpu can be de- 
termined by the method described in section 3.4 for Xp:pop3 from the frequency 
distributions of joins (Table 9) between points of different colors for 2 X 2 X 2, 
2X2x3,2xX3xX3and3 X 3 X 8 lattices when they consist of 1 black, 1 
white, 1 red and (e — 3) green points. 
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The coefficient of =p,p.prpu in the corrected second moment is obtained by sub- 
tracting (obtained in the same way as for the two dimensional lattice in section 
2.5) 
6(18e — 1ld + 4c)(e — 2)? | 
+(3e — 8)[2(—87e + 54d — 20c) + 8(3e — d)’ 
—8(3e — d)° 


from the uncorrected values. The values so obtained for the four lattices are 
480(2 2 X 2), 928(2 K 2 X 3), 1736(2 K 3 X 3) and 3168(3 X 3 X 3). The 
coefficient of p-psprpu , aS in other cases, being of the form 


ae + B’'d + yc + -. 


a’’, 8’, y’”’ and 6” can be determined by equating the above expression to 480, 
928, 1736 and 3168 for the respective lattices. The coefficient so obtained is 


8(33e — 21d + 8c). 


Hence the second moment for free sampling when the lattice contains points of 
four or more colors is 


2(18e — 1ld + 4c)=p,p, 
—2(87e — 54d + 20c)=p-psp: 
—4(33e — 21d + 8c)=p?p? 
+8(33e — 21d + 8c)=p,p.pip. . 


3.5.1) 


In general, it will be found that the cumulants involve terms in c, d, e and an 
absolute term only. Therefore when I, m and n tend to infinity and p,,p2,p3 - - - 
are finite, the distribution of R — 2(3e — d)=p,p. , where R is the total number 
of joins of points of different colors, tends to the normal form. When /, m and n 
are large, 

Rk — 2(3e — d)=prps 
Ve 


ean be considered to be normally distributed with 


(3.5.2) 36Dp-p. — 1742p,-p.p. — 132Ep7ps + 2642 p-p.ppu 





as its variance. 

The distribution for non-free sampling here also tends to the normal form for 
the same reasons given for the rectangular lattice. As in free sampling, for large 
values of 1, m and n 

i 2(3e — d)Ze,e, 
Ve 
is distributed normally with the variance 


as ‘ 2.2 ‘ . 
(3.5.4) 6e,e.e; + 12Zee, — 24 2¢,c,€:€u , 
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where R is the observed number of joins for a given distribution of the points 
Nr ‘ ‘ : . 
and e, = on The error in this variance will be about 5% or less when 1, m and 


n are greater than 36. 
We may conclude this section by giving the first and the second moments for 
free sampling with k colors for an r-dimensional lattice. 
(3.5.5) uy = 2A,Zprps, 
(3.5.6) a 2(A, + B,)=p-ps 
+ 2(3B, + 4C,)Uprpsp 
+ 4C, pips = 8C,2D;DsP Du ? 


where A, , B, and C, are as defined in section 3.1. 
This can be seen from the following facts: 

(1) The coefficients of =p,p, and =p7p’ are the same as for two colors. 

(2) The coefficient of =p,p.p: is the number of ways of getting two joins of 
different colors from combination of points not included in 2p,p.pip. . This can 
be had from three points of three different colors close together and four points 
of three different colors separated into groups of two each such that each group 
will give one join. The number of arrangements of the first kind is 3!B, . For 
the second kind it is 8(A7 + C,). Subtracting from the total number, the con- 
tribution of =p,p.p: in the correction factor 4A7(Zp,p.), the coefficient of Dp,p.p: 
in the second moment works out to be 


2(3B, + 4C,). 


(3) The coefficient of 2p,p.prpu, as in all other cases dealt before, is twice that 
of =p;p: with an opposite sign. 
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MINIMAX ESTIMATES OF THE MEAN OF A NORMAL DISTRIBUTION 
WITH KNOWN VARIANCE 


By J. Wo.rowrrz! 


Columbia University 


Summary. It is proved that the classical estimation procedures for the mean 
of a normal distribution with known variance are minimax solutions of properly 
formulated problems. A result of Stein and Wald [1] is animmediate consequence. 
Other such optimum properties follow. Sequential and non-sequential problems 
can be treated in this manner. Interval and point estimation are discussed. 


1. Sequential estimation by an interval of given length /. In this section we 
shall consider the problem of sequentially estimating the mean of a normal dis- 
tribution with known variance by an interval of fixed length 1. Without loss of 
generality we shall take the known variance to be unity. Such a sequential estima- 
tion procedure, which we shall designate generically by G, is a rule which says a) 
when to terminate taking random, independent observations on the normal 
chance variable with unknown mean §(— «© < & < o) and variance 1, and 
when this termination is to occur after the observations 2; , --- , Xn have been 
obtained, gives b) the center of the estimating interval of length / as a function 
ofz,,--: , 2%. Leta(é, G) be the probability under G that the estimating interval 
will contain £, and let n(é, G) be the expected number of observations when ¢ is 
the mean and G is the estimation procedure (It is assumed that G is such 
that a(é, G) and n(é, G) exist for all &). 

Define 


q(é, G) =1- a(é, G), 
and for fixedc > 0 
(1.1) WE, G) = g(&, G) + enf&, G). 


Let C (N, 1) (l > 0, N a positive integer) be the classical non-sequential estima- 
tion procedure where one takes the fixed number N of observations, and estimates 

, l 
the mean by the interval (: —-= 


l ‘ : 
st + ‘) , Where Z is the sample mean. For p 
such that 0 < p < 1, let C (p, N, 1) be the following estimation procedure: A 
chance experiment with two outcomes, N and N + 1, of respective probabilities 
p and 1 — », is performed. One then proceeds according to C(?, /), where 7(= N, 
N + 1) is the outcome of the experiment. Finally define 


ul nf 
M(y) = Jie e” d. 
y 


1 Research under a contract with the Office of Naval Research. 
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Let us assume for a moment that the unknown £ is itself a chance variable, 
normally distributed with mean zero and variance o’, and let us obtain a pro- 
cedure G which minimizes 


1 2 — 
(12) Blglé,@) + en, @} = Fe [lav @ + ony, @)} exp | 5 | i 


20 
Let 21, --- , Xm be m independent observations on a normal chance variable 
with mean ~ and variance 1. Let 
2, 
~ 1 
= 
m 
The a posteriori distribution of é, given 2,--- , 2», is easily verified (or see 
[1], eqs. (19) and (20)) to be normal with mean 
1 —] 
(1.3) x E + A] 
mo? 


and variance 


1 —1 
(1.4) lm + 1 ‘ 
a 


Thus if we stop after m observations the best procedure from the point of view 
of minimizing (1.2) is to put the center of the estimating interval of length 1 at 
the point (1.3). The conditional expected value of q() is then 


(1.5) Q(x, *t* 5 Lm | a’) = 2u( yi 3) , 


Thus Q(a,-°++: , 2m) is a function only of m and o°. Define 


(1.6) R(m, o*) = 2u(3 ym + ') — au(5 ym +1+ 5) 
2 o 2 o 


We note that R(m, o’) is, for fixed o, a decreasing function of m. We conclude 
that a best decision as to whether or not to take another observation must be 
based on the value of R(m, o’). If R(m, o”) > ¢ take another observation; if 
R(m, o”) < ¢ do not take another observation; if R(m, 02) = c take either action 
at pleasure. Hence, if c is such that R(N, o°) < e< R(N — 1, o’), a best pro- 
cedure from the point of view of minimizing (1.2) is to take exactly N obser- 
vations. This integer N is a function of c and o’, thus: N(c, o’). In the next 
paragraph we shall show that N(c, o°) can be defined for every positive c and o’. 
It is clearly a function which takes at most two values. We shall denote by G(c*) 
the estimation procedure described above which minimizes (1.2). It consists of 
taking the fixed number N(c, o°) of observations and putting the center of the 
estimating interval of length / at the point (1.3). Where N(c, o”) is double-valued 
we may take either value at pleasure. We verify that the value of (1.2) is the 
same for either choice. 
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We now verify that N(c, o°) can be defined for all positive c and o°. We have 
remarked earlier that R(m, o°) is, for fixed o”, a monotonically decreasing fune- 
tion of m. We note that 


lim R(m, oc”) = 0. 


m=0 


When c > R(0, o’) we take no observations whatever and take = 0. When 
c = R(0, o’) we take zero or one observation at pleasure. 
Without difficulty we compute 


— 1 gE 
y, 2 t Tats 
Wt, G(o°)) = WE, o*) = w+M(VN3[1+ yal 7?) 
ete Se 
+ N 2 + No’ + V/N o 
where for typographical simplicity we have written N for N(c, o°). For fixed c and 


o the minimum of W(E, o’) occurs at £ = 0. Also W(0, o°) is a monotonically 
increasing function of o°. If N(c, ©) > 0 then, as o’ — © it approaches the limit 


N(c,) + 2M(3 VNG, =), 

which is the constant value of 

WE, C(N(c, ~), 1). 
We therefore conclude that C(N(c, ©), l) is a minimax estimating procedure of 
type G, i.e., 

W(E, C(N(c, ©), 1)) = inf sup WEE, G) 
@ € 

for any c > 0. (The case N(c, ©) = 0 may be verified separately. We define 
= = 0 for C(O, J)). 


Conversely, let No be a given non-negative integer. Then C(No , J) is a minimax 
estimating procedure G for all W(é, @) for which c satisfies 


R(No, ©) < ce < R(No — 1, &). 


(We define R(—1, ~) = ~».) Thus we can say: For every c > 0 there exists a 
classical estimation procedure C(N, l) with integral N such that 


WE, C(N, l)) = inf sup WE, G). 
@ & 
For every integral N we can find at least one c > 0 such that the above equation 
holds. A method of finding N, given c, and of finding c, given N, has been de- 


scribed above. (We have taken the liberty of calling C(0, 1) a classical procedure. 
Let ao be a given number such that 


l- 2( 5) Sm <1. 


of 


B= 
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Define 7 , 0 < po < 1, and a positive integral No uniquely by 


Let 
Co = RW, oo). 


For c = ¢ we verify readily that both C(No , 1) and C(No + 1, l) are minimax 
estimating procedures G, so that 


WE, C(No, 1)) = WE, C(No + 1, ))) 

po WE, C(No, 1)) + (1 — po) WEE, C(No + 1, 1) 
= (1 — ao) + Colpo No + (1 — po)(No + 1)] 
= (1 — ao) + cfNo + (1 — po)]. 

Therefore, for any G whatever, 


(1 — ao) + cof[No + (1 — po)] < sup {q(é, G) + c n(é, G)} 


< sup q(é, G@) + o sup n(é, G). 


Hence 
sup qé,G@) < 1 — a 


implies 


sup n(é, G@) > No + (1 — po), 


a result first proved by Stein and Wald [1]. 
Also 


sup n(t, @) < No + (1 — po) 
implies 
sup qt, G) é _* a ; 
é 


a result also proved in [1]. 


2. A sequential upper bound for the mean. The fact that in the last section 1 
was a constant made matters simpler, as we see when we begin to consider the 
problem of a sequential upper bound for {(— © < & < ). This of course means 
that we wish to use as estimating interval the interval (— «©, L (1, --- , £a)) 
where L is a function of the observations x; , --- , 2, , and n (a chance variable) 
is the number of observations before the process of taking observations is termi- 
nated. What is wanted now is a suitable definition of the “length” of this in- 
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terval. Also we shall admit the possibility that it might be in some sense advan- 
tageous to have intervals of varying length; this poses the problem of optimum 
choice of the function L(2,,--- , xn). 

As before, let & be the mean of a normal distribution with unit variance. Let 
T be the generic estimation procedure which consists of a rule for terminating the 
taking of observations, and of a function L(x; , --- , 2,) which is used to esti- 
mate & by the interval (— », L7). Define 


q(é, T) = P{Lr < §}, 
AE, T) = E(Lr — &)’, 
and 
(2.1) WE, T) = g(é, T) + kA, T) + eng, T), 


where c and k are positive constants. (We admit only such 7 for which the quan- 
tities g, A, and 7 are defined for all real ¢.) As before, let us temporarily assume 
that £ is normally distributed with mean zero and variance o’, and set ourselves 
the task of minimizing 


i: oo 
(2.2) Vino i Wy, T) gw? l2e?) dy = W*(T, a’) 
with respect to 7’. In the next paragraph we digress for a moment to derive a 


needed elementary inequality. 
Let us prove that, if h, h, , and h. are non-negative, and 


(2.3) hi = phi + (1 — p)hi, 
where 0 < p < 1, then 
(2.4) M(h) < p M(h,) + (1 — p) M(he). 


Hold h and p fixed. The desired result is obviously true when h,= hz = h. Let 
h, and he vary, subject to (2.3). Then 

dhe _ Pm __ 

dh, (1 — p)ho- 


Also 
paM(n) _— —P wi 
dh, V Qr 
and 


a — p) SY 20 p SS = ee 
rz” Pl dha dh /2rhs © 


Thus the derivative of the right member of (2.4) with respect to h; is 0 when 
h, = h, positive when h, > h, and negative when h; < h. From this we get (2.4). 








1e 
eS 
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Let 7’ be any estimation procedure and L7(2,--- , 2,) its associated func- 
tion. Write 


—] 
Inlts +++ yt) = Lely s+ 2) —2[1+ 3] 
no 


If n = mand x, --- , £m is the sample obtained, we have that the conditional 
expected value of W*(T, o’) is 


(2.5) M( inte oad m+ 3) + om + kE(Um + Ir(a1, +++ ,tm))’, 


*: ° ° . e . 
where U,, is a normally distributed chance variable with mean zero and variance 


1 
(m -— 7 The last term in (2.5) is therefore 


-1 
| (m+ 3) + Ip(a1, +> 20) |. 


This is an even function of ly , while the first term of (2.5) is a monotonically de- 
creasing function of ly. Thus (2.5) and hence W*(T7, o”) will be minimized by 
taking ly non-negative. Now take the expected value of (2.5) over the set of 
samples where n = m. Application of the result of the preceding paragraph to 
the finite sums which approximate the integral gives the result that W*(T, o°) is 
minimized when [7(z; , --- , Xm) is a function only of m. Hence we may restrict 
ourselves to consideration of procedures 7’ for which (2.5) takes the value 


(2.6) u(4/m + 3 Ix(m) +eom+k | (m ob 3) + irom} c 


For any such procedure 7’, since k and c are fixed positive numbers (and o’ is 
held fixed for the present), the expression (2.6) takes its minimum for some 
value of m. ‘Thus, in our quest for a procedure 7 which will minimize W*(T, 0’) 
we may restrict ourselves to procedures of fixed sample size. This fixed sample 
size and the (constant) value of ly are functions of k, c, and o°. For fixed m, 


a(4/ m+ r) + k(P)? 


has an absolute minimum at I, , say, since it is a continuous function of °(° > 0) 
which approaches © with [’. The case m = 0 must be considered. (In this event 
= = 0.) Now consider the sequence 


-1 
{u(q/me dn) +en+e[(m+2) +0} 


for m = 0, 1, 2, --- ad inf. This sequence condenses only at «. Hence there 
exists a value N(k, c, o°) of m for which the elements of this sequence have a 
minimum value. We may choose N(k, c, 0°) so that lim,2« N(k, ¢, a’) exists. 
(We verify easily that this is always possible.) Designate this limit by N(k, c, ©), 
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and the associated | by I(k, c, ©). The l associated with N(k, c, a”) will be desig- 
nated by I(k, c, o). Thus a best procedure for minimizing W*(T, o°) is to take 
the fixed number N(k, c, o’) observations, and to use, as upper bound for &, the 
quantity 


m 1 P s 2 
zx E + aes | + l(k, C,o°). 


We see readily that 
l(k, c, 0) = lim U(k, c, 0”) 
and that 


M(V/N({k,c, ~) Uk, ¢, ©)) = lim u(4/ N(k, c, 02) + 5 l(k, ¢, )), 


Let 7’ (c°) be the procedure described above which is a best procedure T' in the 
sense of minimizing W*(T, o°) when o’ is the variance of £. 
We now compute W(é, T(o’)) and obtain 


No* g , 
WG Te) =v +k Ga wom t ('- pp) | 
1+ No? gE 
+ u( J/No l! ~ 1+ wet)? 


where for brevity we have written N and l for N(k, c, 0°) and I(k, c, 0”). Let 


1+ No? 
i- faa Tio® = JN +. 


(2.7) 











Then 
(2.8) W=cN+k l waa + | + M([VN + ¢ 2), 


(2.9) = = 2kr — wes + - €) 


- exp [| —3{(WN + ¢)’2'}]. 


The second term above is always of the same sign and the exponential decreases 
as | x | increases. Thus dW/dx= 0 has the unique positive root x*. Put 2* for 
x in W (in 2.8) and call the result W*. W is a continuous function of x and ap- 
proaches ~ as |x| — o. Since the root 2* is unique it follows that W* is the 
minimum value of W with respect to x. Now N(k, c, o°) is constant for o° suffi- 
ciently large. Hence, for such «’, we have 


oe —2k 7. dax* (WN a €) 1 ees 2 x2) 
- (N+ N+ t de de —— ae LI(YN +e)°z | 
: x* 
(2.10) — Vie OP [-3{(VN + «)*x**}] 
— 2k x* 


TR a Va OP HCN + oa 


y. 
> 


seS 


p- 
he 
fi- 


\ 


‘4 
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since x* is the root of 3W/dx = 0. Also ¢ is positive and, for o’ sufficiently large, 
approaches zero monotonically as o approaches «. For e > 0 we have that 
gW*/de < 0, since 2* > 0. We conclude: For o’ sufficiently large, 


- WE, T(c’)) 
increases monotonically with o° and approaches 
cN+k E * (zu(t)} | + M(VN 2v(k)), 


where N is short for N(k, c, ©) and zry(k) is the unique positive root of the equa- 
tion in x 


3 





2kxr = exp [—4N2’]. 
= p 2 


Going back to the definition of l(k, c, ©) we see that the latter satisfies the equa- 
tion in l: 


s {M(VWN 1) + kl’} = 0. 
Hence 
tn(k) = U(k,c, ©). 
Thus the classical estimation procedure Cy where one takes the fixed number 


N(k, c, ©) of observations and uses as upper bound for the mean # + I(k, c, ~) 
is a minimax procedure 7’, i.e., 


W(é, Co) = inf sup W(E, 7). 
T gE 


For fixed N, zw(k) decreases monotonically from + © to 0 as k increases from 
0 to + «. Hence, for given positive integral No and /* > 0, there is a unique 
positive value kp such that ry,(ko) = /*. Consider the expression 
1 2 
(2.11) B(m) = M(V/m 2m(ko)) + em + ko | + {Xm (Ko) |, 


m 
where m is a positive, continuous variable. We have 


dB(m) =c-— Ko ditm(ko) iM (Vm tm(ko)) + ko lea) 


m dm 02%m(ko) 





(2.12) 


——_ 


The third term of the right member is identically zero because 


(2.13) Qkotm(ko) = Va exp { —3m|zm(ko)]*}. 


T 
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Further we have 
d (m2m(ko) 





ad?B(m) 2ko —hmz2,(k )| 
den ~*~ dm\ 2\/2n e m(ko) \ 
(2.14) : 
_ Qo _ Keg {m™ (2m (Keo))?} 
m3 dm ; 


For typographic simplicity we shall use y for z,,(ko) in the computations of the 
next few lines. From (2.13) we obtain 


log 2ko + log y = —log VW22 + } log m —i my’, 


tol 


‘a I] 
ydm 2m 2 dm 


dy _ yl — my’) 
dm 2m(1 + my’) ~ 





Hence 
d’B(m) “§ ~— dy 
—_—_—_—_—_—_——- = Ie k rs _— 2 vO — 
Ime 2kom ~ + kom ~ y kom y _- 
(2.15) = Dam + kgm? y? — YO — my) 


m*(1 + my’) 
in 2y‘ko 
= Ie ee . 
_ e m(1 + my?) . 


Since c> 0, we have 
lim B(m) = lim B(m) = +>. 
m=0 


m=O 


Hence there exists 2 value of m for which B(m) takes its minimum value. If in 
d B (m)/dm we put m = Np and set the resulting expression equal to zero, we 
obtain an equation inc whose unique solution ¢p, if it is positive, assures us that, 
when c= c and k = ky, B (m) takesits minimum at m = No. A simple compu- 
tation gives 
ko | l* exp {—43Nol*} 
.. 

aT7LV 0 
Actually we are interested in considering B (m) only for positive integral values 
of m. We see readily that the minimum of B (m) occurs then at m = No when 
c is such that 


(2.17) C, (No, ko) S ¢ < C2 (No, ho), 
with c; and cz roots of the following equations in c: 

B (No) = B (No + 1), 

B (No) = B (No — 1). 





(2.16) Oo = 





(If No = 1, then c, = .) 


———s 


we 
at, 


ues 
nen 


| 
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Let Co (No, /*) be the classical (non-sequential) procedure where one takes 
N, observations and uses Z + /* as upper bound for the mean. Choose k = ky and 
c such that (2.17) is satisfied. Then 


1 9 
W(E, Co(No, I*)) = cNo + bl + r) + M(VNol*) 


identically in £. Co(No , /*) is a procedure 7' such that 
(2.18) W(é, Co) = inf sup WE, T). 
r & 


Whenever c and k are given, the N and 1 of the minimax solution may 

be obtained as follows: First we obtain an integer N such that 
a(N,k) <¢ < (N, k). 
Knowing N and k we can then solve for 1. 

The results of this section may be summarized as follows: For every positive 
cand k there exists a classical estimation procedure C(N, 1) with positive integral 
N andl > 0 such that (2.18) holds. Conversely, for every such pair (N, 1) there 
exists a positive pair (c, k) so that (2.18) holds. A method of finding one member 


of the pair of couples (c, ) and (N, l) when the other is given, has been indicated 
above. 


Let T; be any procedure for giving an upper bound for £. We shall say that 
T, is optimum if for any other procedure T2 such that 


— q(é, T2) < - q(é, T1), 
~~ AE, T2) < ~~ AE, Ti), 
we have 
— n(é, T2) = el n(é, T:). 
It is easy to prove that the classical procedure Cp with any positive | and positive 


integral N is optimum by using the results of the last paragraph. For let 1 — a = 
M (l+/N) and let k and c be the corresponding parameters. We have then 


sup q(é, T2) +k “e AE, T2) + € sup n(é, T2) > - {q(é, T2) 
E 


1 9 
+ ky, T.) + en(é, T2)} > (A — a) + (x oo r) + cN. 


Since sup q(~, T2) < (1 — a@) and sup Aé, T2) < 1/N + ’, we must have 
sup n(é, T>) = N, 
E 
which is the desired result. 


In a general unprecise way we may say that an estimation procedure is the 
better the smaller the three quantities 


B(T) = - g(é,T), 6(T) = —_ AE, T), 6T) = —_ n(é, T). 
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We can now assert the following: No sequential procedure T can be superior to 
the classical fixed sample procedure C in the sense that 


B(T) < BC) for? = 1,2,3 


and the inequality sign holds for at least one 7. 

In concluding this section we may remark that the casea < 3, ie.,1 <0, 
may be handled in the same manner as above except that we use M(—1 ~/m) 
in place of M(l ~/m). 


3. Miscellaneous results; point estimation. Without going into the neces- 
sarily involved details, we content ourselves with pointing out that the problem 
of estimating sequentially the mean of a normal distribution by a finite interval 
of length not specified in advance, can be solved in similar fashion. As before 
let ¢ be the unknown mean of a normal distribution with unit variance, where ¢ 
may be any real value. We want to estimate by an interval 


(Zy(t1, °++ 5 tn), Io(%) , +++ 5 Xn)). 


Let c, ki, and ke be positive constants and consider the problem of minimizing 
the supremum with respect to é of 


1— P{L, <t < L,|G’} + en, G’) 
+ ky E[(L, — £)° |G) + ke E[(L2 — "|G, 


where G’ is the generic designation of the estimation procedure. As before, employ 
an a priori normal distribution of — with mean zero and variance o, and let 
ao — o. A fixed sample size procedure will be a minimax solution. It will possess 
optimum properties similar to those described in the preceding sections. The 
problem of minimizing the supremum with respect to & of 


1— Pil, <& < L2|G@} + en, G@) + kE{ (2 — L,)* | ¢, G"} 


can be treated similarly. 


Suppose the sample size is fixed in advance. The problem of finding an estimate 
which will minimize 


or 
supil — Pil, <§< L.|@} + kE\(L. — L,)’ | &, G'}] 
can be treated by the method of the preceding sections. 

The problem of estimating (sequentially or with fixed sample size) the means 
of a multivariate normal distribution with known covariance matrix can be 
treated in similar fashion. 

Suppose it is desired to estimate sequentially the mean — (— ~« < —~ < ~) 
of a normal distribution with unit variance by means of a chance point 


xy 
et 
SS 
he 


Wns 
be 
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E (11, °** , tm). Let RE, t') be the Wald risk function (ef. [2]), a non-negative 
function which measures the loss incurred in using the particular value ¢' as an 
estimate when ¢ is the actual value. The functions £ (2, , --- , 2,) and R(é, £*) 
must have suitable measurability properties for which we refer the reader to [2]. 
Let us seek a procedure &* such that 


sup[E{R(é, &*)} + en(é, &*)] = inf sup[Z{ RE, &)} + ¢ nf, 8). 
: ; # 


Here n(é, =) is the average number of observations under £ when ¢ is the “true” 
mean. The procedure £* will be called a minimax solution. We shall assume that 
R(a, b) is a monotonically non-decreasing function of |a — 6 |, and that there 
exists a positive number g such that 
@ ( omg 
R(O, x) ex =} dxr< m. 
I ) P| 29 
As examples of functions with these properties we may cite 
R(a, b) = |a — bl, 
R(a, b) = (a — by’. 


As before, assume temporarily that ~ is normally distributed with mean zero 
and variance o. We verify without difficulty that a solution € = & which 
minimizes 


1 . gilt 
—— [AS é ts —i>} 
Tax | BIRG D) + onl Dl exp{ —455) ae 


isthe following: nisidentically a suitable constant, say N,and & isz(1 +1/ No*)~ 
= th say, so that h < 1. For this solution we have 


E{R(, &)} + enl, &) = oN + ve [2G ah exp) - J @ - ot az. 





Write u = z — &. Then 


R(, Zh) = RE, h [E+ ul) = RO, hu — [1 — Ale), 


[ R(, Zh) ~~ = (= - pt dé 
oe a ) 
” [ R(O, hu — [1 — hl) exp{— = du 
wo 2 ) 


° _f N, o\ 1 
= [. R(O, v) exp 4 — he (v + [1 — Alé) YF ae. 


Because of the assumptions on the function R the last expression is a minimum 
when ¢ = 0. We may always choose N such that, for large enough o, the integer 
N isa constant, say No . Alsoh > 1 as 0° — ~. Thus we conclude that the follow- 
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ing is a minimax solution: n = No and — = &* = Z. If any estimation procedure 
é is such that sup n(é, £) < No then 
i 


sup E{R(, é)} > E{R(E, &)}. 


If £ is such that 
sup E{R(, 8} < ERE), 


then 


sup n(é, £) > No. 
é 


If the restrictions imposed above on R are satisfied and if the sample must 
always be of given size N, the above argument still holds when 1/N < g, and 
shows that the estimate 7 minimizes 


- E{R&, &)} 
with respect to é. 
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ASYMPTOTIC PROPERTIES OF THE WALD-WOLFOWITZ TEST 
OF RANDOMNESS 


By GorrFrrRIeED EMANUEL NOETHER 


New York University 


1. Summary. The paper investigates certain asymptotic properties of the 
test of randomness based on the statistic Ra = >. 7-1 2:4, proposed by Wald 
and Wolfowitz. It is shown that the conditions given in the original paper 
for asymptotic normality of R, when the null hypothesis of randomness is 
true can be weakened considerably. Conditions are given for the consistency 
of the test when under the alternative hypothesis consecutive observations 
are drawn independently from changing populations with continuous cumulative 
distribution functions. In particular a downward (upward) trend and a regular 
cyclical movement are considered. For the special case of a regular cyclical 
movement of known length the asymptotic relative efficiency of the test based 
on ranks with respect to the test based on original observations is found. A simple 
condition for the asymptotic normality of R, for ranks under the alternative 
hypothesis is given. This asymptotic normality is used to compare the asymptotic 
power of the R,-test with that of the Mann T-test in the case of a downward 
trend. 


2. Introduction. The hypothesis of randomness, i.e., the assumption that the 
chance variables X,, --- , X, have the joint cumulative distribution function 
(cdf) F(a, --+ , %n) = F(a) --- F(a,) where F(x) may be any cdf, is basic in 
many statistical problems. Several tests of randomness designed to detect 
changes in the underlying population have been suggested, however mostly on 
intuitive grounds. Very seldom has the actual performance of a test with respect 
to a given class of alternatives been investigated. It is the intention of this 


paper to carry out such an investigation for the particular test based on the 
statistic 


n 
Ri, = z. Li Lith, In+j = Uj, 
i=1 


proposed by Wald and Wolfowitz [1]. It is suggested in [1] that this test is 
suitable if the alternative to randomness is the existence of a trend or a regular 
cyclical movement. Both these cases will be treated. 

Let a,, --- , a, be observations on the chance variables X,, --- , X, and 
assume that the hypothesis of randomness is true. (Henceforth this hypothesis 
will be denoted by Hy while the hypothesis that an alternative to randomness is 
true will be denoted by H;.) Restricting then X,,--- , X, to the subpopulation 
of permutations of a,, ---, a@,, any one of the n! possible permutations is 
equally likely, and the distribution of R, in this subpopulation can be found. If 

231 
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the level of significance a is chosen in such a way that a = m/n! where m is a 
positive integer, the test is performed by selecting m of the n! possible values of 
R, and rejecting Hy) when the actually obtained value of R, is one of these m 
values. The particular choice of the critical values should be such as to maximize 
the power of the test with respect to the class of alternatives under consideration. 

Denote the expected value and variance of FR, in the subpopulation of equally 


likely permutations of n observations a;, --- , a, by E’R, and V°R,, respec- 
tively. Then it is shown in [1] that if A is prime to n 
(2.1) Ph i ~ Ad 
n—1 
and 
Ym —- UW - 4 
n—l1 
OO +e  ~ 4A ++ 2 
(n — 1)(n — 2) 
1 2 2 
= (n — 1)? (Ai A») ? 


where A, = ai + --- + ah, (r = 1, 2, 3, 4). Actually (2.1) and (2.2) are valid 
as soon as n > 2h. 

Let Ri = (Ri — E’R:)/~/V°R,, . Then it is also shown in [1] that if h is prime 
to n, R is asymptotically normally distributed with mean 0 and variance 1 
provided the a;, (¢ = 1, --- , n), satisfy condition W: 
= > — 2) 
penne, oe OKI), (r = 3,4, -->), 


It is easily seen that condition W is satisfied when the original observations are 
replaced by ranks. When the a, --- , a, are independent observations on the 
same chance variable X, condition W is satisfied with probability 1 provided X 
has positive variance and finite moments of all orders. It is interesting to compare 
this condition for asymptotic normality of R, in the population of permutations 
of observations on the chance variable X with the condition for asymptotic 
normality of R, under random sampling. For this case Hoeffding and Robbins 
[3] have shown that it is sufficient to assume that X has a finite absolute moment 
of order 3. Thus it is desirable to weaken condition W. This will be done in 
Section 3. 

In further sections the consistency and efficiency of the test based on R,, will 


Mali 


=l— 


i 


- —l 
where d = n* >it aj. 


1 The symbol O, as well as the symbols 0 and ~ to be used later, have their usual meaning. 
See, for example, Cramér [2], p. 122. 
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be examined assuming that under the alternative hypothesis observations, 
though still independent, are drawn from changing populations. Throughout the 
paper the circularly defined statistic R, is used. However, if with probability 1 


Ln—h+i%1 + sila + 0 0 ie o(R,), 
it is seen that asymptotically the test based on the non-circular 
—h 
R, — >» Ui Lith 
i=1 
has the same properties as that based on R, . We find 
. n—-h 
Oy ee ee 
n(n — 1) 
n—h ° 2(n — 2h) 2 2 
wearer hs — Bg) fe een (AA, ~ AG — Bh 
aay (AE 40 + a ay (Aids — AB — 24s + 240 


(Ai ro Ad), 
VR, = 


“ (n — h — 1) — h — 2) + 2h — 1) 
n(n — 1)(n — 2)(n — 3) 











(Al — GA{As + 841A; + 343 — 6A,) 


(n — h)° 2 2 
— n?(n — 1)? (Aj — A2) ° 
3. Asymptotic normality of #, under randomization. Let the set of chance 
variables X,, --- , Xn be defined on the m! equally likely permutations of n 
numbers %, = (a), --* Gn). Then we have 
TurorEeM 1: The distribution of R}, tends to the normal distribution with mean 0 
and variance 1 asn — ~ provided 


> (a; — a)’ 


(3.1) , =, = o[n®”", (r = 3,4, ---), 
| (a = ay| 
t=1 


n 
= on 
whered = n » ¥ ai. 


i=1 

Remark: The set %, need not be a subset of Una: . 

The proof of this theorem will be omitted, since it is very similar to the proof 
of another theorem by the author [4]. 

THEOREM 2: If the a; , G2, --- are independent observations on a chance variable 
X having positive variance and a finite absolute moment of order 4 + 6,5 > 0, 
condition (3.1) is satisfied unless possibly an event of probability 0 has occurred. 

The proof of this theorem will be based on Markoff’s method for proving the 
central limit theorem in the Liapounoff form.? Thus we shall show that there 
exists a sequence of sequences 8, = (bn, °°: , bnn) such that unless possibly 
an event of probability 0 has occured, (i) there exists an index n’ (depending 


2 See, for example, Uspensky [5], pp. 388-95. 
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on the given sequence) such that forn > n’, Y%, = B, , and (ii) the sequences %, 
satisfy condition (3.1) expressed in terms of the b,;, (¢ = 1, ---, 7). 

It is no restriction to assume that EX = 0, since the addition of one and the 
same constant to every a; does not change (3.1). Let 


1/(4+8/2) 
N=N(n)=n ; 


and define fori = 1, ---,n 
bas = a, Cini = 0, if a; 4 N(n), 
= 0, = @;, ifa; > N(n), 


so that a; = bn; + c¢n;. Then b,; and c,; can be considered as observations on 
chance variables Y, and Z, , respectively, where 


Y, = X, Zn = 0, if X < N(n), 
= 0, = X, if X > N(n). 


Further let p, = P{Z, = X},a(U) = EU’, B(U) = E| U |’ where U =X, 
Y,, Zn and r is positive integral, if these moments exist, Bij; = E | X |***, 
and finally, let F(x) be the cdf of X. 

In order to prove (i) consider the infinitely dimensional sample space Q with the 
generic point w = w(a,, a, ---) andletE, = {w|a, > N(n)}, (n = 1, 2,---). 
Then E, has probability measure p, . We shall show that >-°_, p, converges. Since 


oO —N a0 
Bays = [ | a |? dF(x) > N**? | dF (x) + [ ar 2) > N***o,, 
2 +N 


1 


< Bass as = Bars DID - 
Pas + N + Bax n' d4¢ 


Now (4 + 6)/(4 + 6/2) > 1 and the infinite sum converges. It follows that the 
set E of points which belong to infinitely many sets Z, has probability measure 0. 
Thus for every point w « 2 except those in a set of measure 0 there exists an 
index n, (depending on w) such that forn > n, 


(3.2) d, < N(n). 


Further, since n, is finite and N(n) — ©, it follows that for these points there 
exists a second index n, > n, such that in addition to (3.2) a, < N(ni), (n = 
1, --- , m»). Thus except on a set of measure 0 the sequences %, are identical 
with the sequences %, for n > n.. . This proves (i). 

In proving (ii) let Bn, = Do21b;:, (n, r = 1, 2, ---). We first note that under 
the assumptions of the theorem nA, — a,(X) for r = 1, 2, 3, 4 except on a set 
of measure 0. Thus except on a set of measure 0 


a = nA, = (1), A, = X(n),° A; = O(n), A, = O(n), 


3A function f(n) is said to be of order Q(n*), k real, if f(n) = O(n*) and lim inf 
| f(n)/n* | > 0. 


=e = CUN 


nf 
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and therefore by the argument used in proving (i) again except on a set of 
measure 0 


6, =" Bu =o(1), Bno=Q2n), Bas= O(n), Bu = O(n). 
It follows that in order to prove (ii) it is sufficient to show that 
(3.3) Bur = o[n"*?/*), (r = 5, 6, -- *)s 
except on a set of measure 0. 
Now forr > 5 
a(Yn) < BA(Yn) < N’“Bi(Yn) < N’“,(X), 
and therefore 
a(Yn) ae O(N**) _ eee, 
It follows that 
EBa sa na,(Y. ) = eee 
and 
var By = n var Yn = nlas(Yn) — ar(Yn)] = O[n 1/449) 


so that 


’ 


o(Bnr) ee grrr |. 


Assume now that for some r > 5 (3.3) is not satisfied on a set F, having 
measure ¢, > € > 0. We shail show that this assumption leads to a contradiction, 
and that therefore (3.3) is true. 

Choose e such that 


(3.4) 1/2 <e < (16 + 78)/(82 + 486). 


Since r > 5, (3.4) can always be satisfied. Then the infinite sum >> %_, (1/n”™) 
converges, and a positive constant d can be found in such a way that 


«a SS 
Pp ad? oat ne 
If we then write the Tchebysheff inequality 
P{| Bar — EB | > dn‘a(Bar)} < 1/d'n™, 
it is seen that except on a set having at most measure p 
Bae = O{max{n*t82/4+8/2) nin teoiatsi2) 
nr . , 


Now for r > 5 


< € 


a 
Es 


(r + 6/2)/(4 + 6/2) < 1/4 
and by (3.4) 


e+ (r + 6/4)/(4 + 6/2) =e + 7/4 + (6/4 — 15/8)/(4 + 6/2) 
< r/4 + (16 + 26)/(32 + 48) = (r + 2)/4, 
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so that the measure of the set F, is not even equal to e. This contradicts our 
assumption, thus proving Theorem 2. 


4. Consistency. To prove consistency of tests based on permutations of 
observations a;, --- , ad, the following procedure can be applied. Let the test 
statistic be S, = S(x,, --- ,2,) and denote by E., = E\(a , °°, Gn) and V%, = 
V°(a, , --- , @n) the expected value and variance of S, under the assumption that 
the set of random variables X,, --- , X, is restricted to the subpopulation 
consisting of the n! equally likely permutations of the observations. Assume 
that for the alternatives under consideration large values of S, are critical. 
Then we reject the null hypothesis whenever (S, — Es)/+/V®°, > k where k is 
some positive constant depending on the limiting distribution of S, under the 
assumption of equally likely permutations and the level of significance. Thus 
in order to prove consistency we have to show that 

~~ i 7} 
(4.1) lim Pi ore >k Hy: 1 


(4.1) will be satisfied if for some « > 0 


lim P < Se = EB >e H,\ = 
ae | Vnve i 
Thus we shall have proved consistency, if we can show that when H, is true, 
E’./~/nV®, converges in probability to 0 and there exists some e > 0 such that 
lim no P{S,/~/nV®, > €| Hi} = 1 

Applying this method to our problem and noting that a corresponding pro- 
cedure could have been used in the case when small values of S, are critical, 
we obtain . 

THEOREM 3: The test based on R, is consistent with respect to alternatives for 
which 


E* R, 
—/nVo Ri * 
and there exisis some e > 0 such that 


' f R | ) 
(4.3) lim Ps Soa > ep =] 


oe \l 


(4.2) 


where ER, and V°R, are given by (2.1) and (2.2), respectively. 

In what follows it will always be assumed that under the alternative hypothesis 
observations are independent from chance variables X, with continuous cdf’s 
F,(x), (n = 1, 2, ---). We shall often have the opportunity to make use of the 
fact that the test is not changed if one and the same constant is subtracted 
from every observation. This will be helpful in reducing our problem to one for 
which (4.2) is true. 

Let a; be the rank of the observation x; on the chance variable X,;, (¢ = 
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1, --- , n). Then it is no restriction to assume that these ranks take the special 
form 


so that A, = 0, As = pe(n? — 1)n = Q(n*) and 


(4,4) VR ~ . AB ~ cian = O(n") 








and therefore (4.2) is always satisfied. 
Before we can find conditions under which (4.3) is satisfied, we have to in- 
vestigate the expected value and variance of R, when H, is true. For this purpose 
write a; = 2 os Yij, (2 — 1, a n), 
Yi3 = —1/2 if tz; > 2%, 


(4.5) yun = 0. 
= 1/2 if x; < Li5 


Then if P{X; < X;} = pi, (i,j = 1, --- , n), we find 
Eyi; = 3pi3 — 3(1 — Dis) = Dis — 2 = C3, = (Say). 


Further, 

(4.6) R, = a a 2 Yaz Yirh,k Ynk = Yu. 
i=] j=1 k= 

Therefore 

(4.7) E(R, | Hy) = - ie x €:; 44,2 + O(n’) 
: 9 


and 


var R, = E 2 2 Yiz Yith,k Yow Yath.y — EH 2d Yij YitheE a YaB Yath,y 
27 apy 12 apy 


(4.8) 


= DD (Ey ip yisnr Yas Yarh.y — EYyiz Yirn.r LYap Yorh,y)- 


tjk aBy 





In (4.8) the expression in parentheses is 0 unless one of the Greek indices (in- 
cluding a + h) equals one of the Roman indices. Therefore var (R; | Hy) = O(n’). 
It then follows from (4.4) that 


R,//nV'R, ~ “Rs =? 12 lim “E(RulHt), 


and we can state the following corollary to Theorem 3: 


CoRoLLARY: When using ranks, the test based on Ry is consistent, if under the 
alternative hypothesis 


(4.9) = = > > €:;€iph,4 = Q(1), 


n> j=l j=l k= 


= P{X; < X;} — 3. 





where €;; 
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Since ¢;; = —ey, we can write 
he i dX C67 €ith,k = a > > eij(€irne — 6444) = LD, (say), 
+ 2 .* #25 
and the test is consistent if 


(4.10) — “ L #0. 


4.1. Downward (upward) trend. Assume that for 7 < 7 and all k 
(4.11) 3 <0 


and 
(4.12) €x < €jk « 


These requirements are equivalent to P{X; < X;} < 1/2 and P{X; < X;,} < 
P{X; < X;} and are satisfied if the alternative to randomness is a downward 
trend in the sense that Fi(x) < F(x), (-x < 2 < «,1i < 7), with at least 
one interval of strict inequality. 

(4.11) and (4.12) are not sufficient for (4.10) to be true. Thus assume in 
addition that there exist a positive integer n’ and a number e < 0 such that 
l.u.b. j-izn’ €3 = €. Then 


n 


a — - 
lim — L > lim = 2. Va €sj(€sth ke =n €j4h,k) 
no 1” no Uk isk—h—n’ 

j2k— 


1 n 
a3 


> 2? lim — So (k-—h—n)(n-—k+h—n’4+1) = 2€(8 —3) > 0, 


nao =1 


and the test is consistent. 

The case of an upward trend can be treated in exactly the same way. The 
test is consistent with respect to alternatives for which for 7 < 7 and all k, e;; > 0, 
€ik = €jk 5 and g.1.b. j-izana’ Gj = €, where this time e > 0. 

Another test of randomness, the so-called 7T-test, has been proposed by 
Mann [6] with exactly this alternative of a downward (upward) trend in mind. 
This T-test is also consistent provided certain general conditions are satisfied. 
Thus the question arises which of the two tests should be chosen if a downward 
(upward) trend is feared. This question will be considered in Section 7. 

4.2. Cyclical movement. Let the class of alternatives be specified by 


(4.13) €lg+a,mo+hB — €af, (a, 8 oa 1, eda > 1; l, = 0, 1, vee), 


in other words, assume that the statistic R, is used to test for randomness while 
under the alternative hypothesis there exists a regular cyclical movement with a 
period of length g. It is sufficient to consider the case h < g. 

If (4.13) is true, 


(4.14) a €:j€:nk = 0 >. €:.€:4n,. + O(n’) = nn + O(n’), 
ijk= 


=] 
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where 
(4.15) 


and 


g 
(4.16) ae : Zi €a. €a+h,.- 
J a=1 
Thus in view of (4.9) the test is consistent if 7 ¥ 0. 

If h = g,n reduces to a sum of squares and is therefore > 0 if some e,. ¥ 0. 
However it is possible that some or even all eas ¥ 0, (a ¥ 8), and still e., = 0. 
If this happens, the test is inconsistent, otherwise it is consistent. If under H, 
the populations from which consecutive observations are drawn differ only in 
location, the above mentioned exceptional case cannot happen, and the test is 
always consistent with respect to this class of alternatives. 

If h < g, it is not difficult to construct an example where Fis €a.€a+h,. * 0 
while >~%-1 €rq-€ra,n+ = O, Where ther, are a permutation of the numbers 1,--- , g. 
Thus in this case it is not sufficient that some ez. ~ 0 for the test to be consistent. 
Consistency may also depend on the order of the elements of a period. 

We may conclude that if g is known, we should always choose h = g. If g 
is not known, we may as well take h = 1. 

4.3. Change in location. Turning now to the case when the test is performed 
on the basis of the original observations, it will often be appropriate to assume 
that under the alternative hypothesis the distribution remains the same except 
for a location parameter. We shall consider only the case of a cyclical movement. 

Thus let 


F,,(z) = F(x — ma) (n = 1, 2, -++), 


where F(z) is the cdf of a chance variable U having mean 0, and m, is a location 
parameter. It will also be assumed that U has the positive variance o° and a 
finite fourth moment. 
In the cyclical case with period g 

(4.17) Mig+a = Ma (@=1,---,g >1;1=0,1, -+-). 
We shall find conditions under which our test is consistent with respect to 
alternatives of this kind. Obviously we can assume that >-%_1m. = gm = 0, 
since otherwise we could have subtracted m from every observation. Writing 


then ad, = Un + m, , (n = 1,2, ---), where wu, can be considered as an observation 
on the previously defined chance variable U, we find 


A, = > a; = 2, us + O(1), 


t=1 


Ar = Duit+2> um; + dm; 
i=1 ‘<1 


t=1 


=> uvt+ 2 3 me tore + |" | > m, + O(1), 
i=1 a=1 i=0 g | a=1 
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where aq is the largest integer such that neg + a < n and [n/g] the largest 


integer < n/g. A; and A, are given by similar expressions. Since we assumed that 
EU = 0, EU’ = o > 0, and EU* < ~, we have with probability 1 
Dui=on), Duz=An), DYLw=O), Dui =O), 
i=1 i=1 i=1 i=1 
so that with the same probability 
A; = o(n), Ay = An), A; = O(n), A, = O(n). 
It follows that with probability 1 


Dn cy 
FR, = o(n), V°R,~ 7 A? = On), 


and condition (4.2) of Theorem 3 is satisfied. 
Since further 


n n 
var Ry = >. var(z;tign) + 2 >> cov xsxixa, Lign Liz2n) 
i=1 i=1 


n 


=P {@ + mC + miss) — mimi} 


i=! 


nr 
+ 2 ie 1m; Missn(o" + m+n) a mish Mi+2n} 
i=1 
n 
i ee 2 
= a fo" a o (m5 + mizn + 2m; Mi+on) } = O(n) 
= 


and therefore except on a set of probability measure 0 


~ “ : R, lim : E(R, | Hs) 
V/nVR, As yi e os 
. ' é o +- 7. Mea 
gq a=1 
condition (4.3) is satisfied provided lim ,..” E(Ra| H:) ¥ 0. Now E(R, | H;) 
= [n/g] >-%e1 MaMa+n + O(1), so that the test is consistent with respect to the 
class of alternatives (4.17) for which 


> 


9 
a (ma ae m) (Ma+h en m) x 0, 
where 7% = g >.%-1 ma. Thus by the same argument as in the case of ranks, 
the test is consistent whenever h = g, while it may or may not be consistent 
ifh < g. 


5. Limiting distribution of R, under H; in case of ranks. For the remaining 
two sections, it is of importance to know conditions under which R; based on 
ranks is asymptotically normal under the alternative hypothesis. Using the 
methods of moments, it can be shown that in this case the distribution of 




























bh } 


H;) 
the 


ks, 
ent 


ing 
on 
the 
1 of 
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(R, — ER;)/o(R,) tends to the normal distribution with mean 0 and variance 1 
provided var R;, = Q(n’). 

Generalizing the method used in Section 4 in evaluating the variance of R, , it is 
not difficult to see that E(R, — ER,)**? = O(n™**), (s = 0,1, ---). It follows 
that if var R, = Q(n’), the odd moments are asymptotically zero. By means of a 
more careful analysis, it is also possible to show that E(R, — ER,)” ~ (2s — 1) 
(2s — 3) --- 3(var R;)*. This proves our statement. 


6. Ranks versus original observations. We have seen in Section 4 that if the 
alternative hypothesis is characterized by a regular cyclical movement the test 
based on R, is consistent both for original observations and for ranks, provided 
h = g, where g is the length of a cycle. The question arises which test is more 
efficient, the one based on original observations or the one based on ranks. 

In trying to answer this question, we shall make use of a procedure due to 
Pitman’, which allows us to compare two consistent tests of the hypothesis 
that some population parameter @ has the value & against the alternatives 
6 > 6 using critical regions of size a, Sin > Sin(a), (¢ = 1, 2), where Sin is a 
statistic having finite variance and S,,(@) is an appropriate constant. The 
relative efficiency of the second test with respect to the first test is defined as 
the ratio n;/n2 where n. is the sample size of the second test required to achieve 
the same power for a given alternative as is achieved by the first test using a 
sample of size mn, with respect to the same alternative. 

Let E(Sin |) = Win(0), var(Sin | @) = oin(@), and Win(@)/om(6) = Hin). 
Assuming that the alternative is of the form 0, = 6° + k/+/n where k is a 
positive constant, Pitman has shown that the asymptotic relative efficiency of the 
second test with respect to the first test is given by lim ,.. [H2(n)/H}(n)], pro- 
vided there exists a number « > 0 such that for 6° <0 < @ +. 


(6.1) Vin (0) exists; 


as 6, > 6° with n > 











Vin (On) 
os Vino) + 









oin(On) ' 


, 1 : ate 
(64) lim ae H,(n) = c;, where ¢; is some positive constant; 
(6.5) the distribution of [S;, — Win(@)]/oin(@) tends to the normal distribution 
with mean 0 and variance 1 uniformly in @. 

4]I should like to thank Professor Pitman for his kind permission to quote from his 


lectures on non-parametric statistical inference which he delivered at Columbia University 
during the spring semester 1948. 
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Condition(6.5) can be replaced by the weaker condition 


(6.5’) the distribution of [Sin — Win(On)]/oin(0n) tends to the normal distribu- 
tion with mean 0 and variance l asn > o~. 


In our case, in order to insure consistency, it will be assumed that h = 
Consider the parameter 


h 
(6.6) a? 2 ( (ma — m)”’, 


where as before m, is the expected value of the (/h + a)th observation, (J = 
0, 1, ---). We want to find the asymptotic relative efficiency of the test per- 
formed on ranks with respect to the test performed on original observations as 
¢6—O0withn— o~. 
Again it is no restriction to assume that 
h 

(6.7) m= : Ma = 0. 

zt 
Assume further that the chance variable U defined in 4.3 has a finite absolute 
moment of order 4 + 6, 6 > 0. Then Ri ~ ~/nR;/Az with probability 1 and, 
if the null hypothesis is true, it follows from Theorem 2 that with the same 
probability the statistic 


n 
/n a Li Xitn 
ja 
— 
Dy zi 


+=] 


Q, = 


has in the population of permutations of the observed sample values an asymptot- 
ically normal distribution with mean 0 and variance 1. This, however, is also 
the limiting distribution of Q, under random sampling when the null hypothesis 
is true, as follows from the results of Hoeffding and Robbins [3]. Thus it will be 
sufficient to find the asymptotic relative efficiency of the R,-test for ranks with 
respect to the Q,-test. In doing this, it will also be assumed that U has a con- 
tinuous density function f(z) = F’(x), and, in order to simplify notation, that 
there are nh observations instead of n. 

In finding Ho(nh), let ro,5 = Lazy = Ly-yh+a ANd Us,j = Uajy = UG-phta; 
(a = 1,---,h3j = 1,---,n). Then 


1 1 h n 1 h n 
wnt = a te = 5 Th oo Dy (las + ma)’ 
h n n 
= = EF att am Saat anth-oe + 6. 
Nh a=i 7=1 


Further, 


R, = Z ‘z Uaj Ua,j-1 + 2Ma 2» Uaj + nm’) 
j= 


a=1 |j= 


~ 
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so that 
a 
h 
EQ, =E Varh i Vmhb _ Yon(6). 
1 o + 6 
ome die 
nh 
Therefore 
2 
, o 
Won(6) = ¥ nh (o? + 6)? + 6)" 
Also by (4.18) 
h 
4 2 2 
pili nho + 4no x Me : f+ 4o26 
nh(o? + 6)? (o? + 6)? 
which converges to 1 as 6 — 0. It follows that 
6.8) Ha(nh) = von(0) = V2 


Conditions (6.1)-(6.5) are easily seen to be satisfied. 
Considering now the R,-test for ranks, we know that (nh)~°?R, has finite 
variance. From (4.7) and (4.14)-(4.16) it is found that 


9 


(6.9) El(mh)°" R, |) ~ Vnhn = Vah a a ( 2d ca) = Vrn(6) 


and after some computations 


(6.10) Yan(0) = Vnh | f f(a) az | 
From (4.4) and (6.10) 


Hr(nh) = 12-/nh if f(x) as 


Conditions (6.1)—(6.4) and (6.5’) can be shown to be satisfied. 
Thus the asymptotic relative efficiency of the test based on ranks with respect 
to the test based on original observations is 


co 4 
144nh | [ f°(a) as | © 4 
ah/e = E [ F@ as | 
As is not difficult to see, this expression is independent of location and scale. 
Let the chance variable U have density function 


2 
: 


(6.11) Hre = 


(9, z<—l, s >t, 
ll4tq_e ; 
fe) =\T+a° 1 S#S4 





1—a’ 





sine 
1-—z 
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i.e., let the graph of f(x) be given by the two straight lines connecting the points 
(—1, 0) and (1, 0) with the point (a, 1). Then EU = a/3, var U = 75(3 + a’), 


[ f(x) dx = 2/3, and (6.11) becomes [8(3 + a’)/27]’. Thus Hae increases 


with | a |. For a = 0, it is equal to 64/81; for | a | = 1, it is equal to (32/27)’. 
It is equal to 1, fora = +/3/8. 

This example shows that the asymptotic relative efficiency of the rank test 
with respect to the test based on original observations may be <1, =1, or >1, 
depending on the density function f(x). Unless f(x) is explicitly given, no state- 
ment can be made as to which of the two tests is to be preferred. 

We are now in a position to give at least a partial answer to a question raised in 
[1]. In concluding their paper, Wald and Wolfowitz note that the problem dealt 
with in this section can be posed not only when transforming to ranks, but also 
for any transformation carried out by means of a continuous and strictly mono- 
tonic function h(x). 

Let t = h(x) be such a transformation, satisfying in addition the condition that 
Pitman’s procedure remains applicable for the transformed distribution. Corre- 
sponding to o° and Q we shall use o7 and Q, . Let,h(m.) = wa, h tl (ua — a) 
= #. Then if EQ; ~ Wo,n(0), by (6.8), (6.9), and (6.10) 


dYoin(9)| _ dan dd dy | 
dé amo di dr dé \s— 


(6.12) Vahl? i 
ae { [ Foow*o ar} [fre ar | = Hun 





—™ 3 


where g(t) is the inverse of h(x). Therefore by (6.8) and (6.12) 


\° [ f(z) az} 
{e [ . fF [g@]o”" at 


and the asymptotic relative efficiency does not merely depend on h(x), the 
operator defining the transformation, but also very essentially on the underlying 
distribution f(z). 


He.9 os 


7. Comparison of the #,- and 7'-tests. The 7'-test by Mann [6] designed to 
test for randomness against a downward trend is based on the statistic 


T= a ie (yi; + 3) = > Z Yi; + in(n — 1), 
—s 37% $ g>1 


where y;; is defined by (4.5). Making the same assumptions as in 4.1, Mann 
shows that under the nuli hypothesis 7 has a limiting normal distribution with 
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so 
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= 


mean in(n — 1) and variance 7_(2n* + 3n? — 5n), while under the alternative 
hypothesis 


(7.1) ET = {n(n — 1)(25, + 1), 
where ¢, is defined by 3n(n — 1)ta = Doi Do): ey <0. 
Let 
6 
S, = [T — n(n — 1)]. 


When H, is true, S, is asymptotically normal with mean 0 and variance 1. If 
1 eo 

; a —}r2 
we then put (A) x [ € 


S, < —A, where \ is determined in such a way that ¢(A) = a, the level of 
significance. 
When H;, is true, we find from (7.1) 


E(Sn | fn) ~ 83 nkn- 


By paralleling the proof of asymptotic normality of R, under H, given in Section 
5, it can be shown that (S, — ES,)/o(S,) is asymptotically normal with mean 0 
and variance 1 provided o(S,) = Q(1). This is essentially the result obtained 
already by Hoeffding [7]. Thus the asymptotic power of the test based on S, 
is given by 


dz, a critical region for testing Ho is given by 


. A+ 3Vn 2) 
P{S, < —A} ~ 6 | —_—— 
7.2) {S A} ~ @ ( a(S.) 
converging to 1, provided lim,..,, /n {2 = — ©. Thisis thecondition for consist- 


ency given by Mann. 

We may ask for the asymptotic power of .ae S,-test as ¢, > 0 with n > ~. 
More exactly, instead of considering a certain alternative ¢;; = ki; , where the 
k;; are given constants, consider the alternative (changing with n) 

(7.3) T= /n ° 


If then asn — 
2 
n(n — 1) X 2d, ia ities 
and 
o(S,) > 1, 


it follows from (7.2) that the asymptotic power of the S,-test, and therefore of 
the T-test, for alternatives (7.3) is equal to 


$(A + 3k). 
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Now consider the same situation when the statistic R, is used instead of T. 
We know that when H, is true 


Rk. = R., 


nl? 
where R, is given by (4.6), is asymptotically normal with mean 0 and variance 1. 
Thus in this case the critical region is given by R., > X. Ifwe seté, = Session 
we find 

E(R, | x) ~ 12V ngs , 


and asymptotically the power of the R.,-test is 


s A — 12VYnén 
(7.4) PiR,>rN~o (ene), 


provided o(R’,) = 2(1). Thus the test is consistent if lim,.,, ~/nt, = ©. How- 
ever, for the alternative (7.3), (7.4) tends te (A) = a, provided that asn > o 


o(Rn) > 1. 


Thus the R,-test is ineffective with respect to the alternative (7.3) in contrast 
to the T-test. This means that for this alternative the asymptotic relative 
efficiency of the R,-test with respect to the T-test is 0. 


Acknowledgment. The author wishes to acknowledge the valuable help of 
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THE DISTRIBUTION OF THE NUMBER OF EXCEEDANCES'! 


By E. J. GuMBEL aND H. von SCHELLING 
New York and Naval Medical Research Laboratory, New London, Connecticut 


0. The problem. We study the probability that the mth observation in a 
sample of size n taken from an unknown distribution of a continuous variate 
will be exceeded x times in N future trials, and calculate the averages, the 
moments, and the cumulative probability function of the number of exceedances. 
This problem leads to the hypergeometric series. Our starting point is a special 
case of a distribution studied by Wilks [3] who considered several order statistics 
whereas we consider only one. His tolerance limits are special cases of our 
cumulative probability function. Thus the present paper is, at the same time, a 
specialization and a generalization of the work done by Wilks. 


1. Distribution. From a continuous variate ~ an alternative is constructed 
by choosing the mth among n observations §~(m = 1, 2,---,n). The rank m 
is counted from the top, which means that m = 1 (m = n) stands for the largest 
(smallest) observation. The observation £, is thus the mth largest value. We 
ask: In how many cases x will the past mth observation be equalled or exceeded 
in N future trials taken from the same population? For the sake of simplicity, 
x is called the number of exceedances. 

If the initial probability F (n,n) = Fm for a value less than £,, is known, the 
alternative probability for exceeding ~m is 1 — F, , and Bernoulli’s theorem gives 
the probability 


(1.1) wi(Fn, N,2) = ‘ (1 — Fn)'Fm~ 


that 2 among N future trials will exceed £,, . However, as a rule the probability 
F,, is unknown. The only data known are the n past observations. To eliminate 
the probability Ff, , we introduce the distribution v(F,) of the frequency F, 
of the mth largest among n values 


(1.2) v(n, m, Fm) dFn = (”) mF2-"(1 — Fn)" dF a; 


consider F’,, as a variate, and integrate (1.1) over all values of this variate. Thus 
F,, is replaced by a function of n and m. 
The convolution of (1.1) and (1.2) leads to the distribution w(n, m, N, x) of 


1 Opinions or conclusions contained in this paper are those of the authors. They are 
not to be construed as necessarily reflecting the views or endorsement of the Navy De- 
partment. 
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the number of exceedances over the mth largest among n observations in N 
future trials 


(”) m (*) 
(13) win, m, N, 2) = ms Na 


(N +n) E +n— a 


m+z-1 

This probability depends upon the parameters 7, m, and NV, but not upon the 
unknown probability F,, . Therefore it is distribution-free. If we are interested 
in the dependence of w(n, m, N, x) on x only we simply write w(x). The conditions 
for the positive integers m and x, and for the probability w(x) are 


(1.3’) limezn; 0 


IA 


s 3 iN; > w(x) = 1. 
0 


The distribution (1.3) possesses the following symmetry 
(1.4) w(n, m, N, 2) = w(n,n —m+1,N,N — 2) 


which reads: The probability that the past mth value from above will be exceeded 
x times in N new trials is equal to the probability that the past mth value from below 
will be exceeded N — «x times. 

The nN probabilities w(n, m, N, x) are linked by several recurrence formulas 
which follow easily from the usual combinatorial rules. For fixed m, the probabil- 
ity for x + 1 is obtained from the probability for x by 


. ee - (N — x)(m + 2) 
(1.5) w(n, m, N, rtl1)=u (n, Mm, N, x) (N-+n—m—az)(a+1) 


= w(nsn—-m+1,N,N — 2). 


In the same way, the probabilities w(n, m, N, x + 1), w(n, m + 1, N, x) and 
w(n, x, N, m) are easily obtained from the probabilities w(n, m, N, x). The dis- 
tribution (1.3) has many aspects since, besides the number of exceedances x, 
also the rank m and the number of future trials NV may be considered as variates. 

For m = 1 and m = n, the distribution of the number of exceedances over the 
largest value diminishes with z, and the distribution of the number of exceedances 
over the smallest value increases with x. For x = 0, and m = 1, we obtain from 
(1.3) 

2 rT n T 
(1.6) w(n, 1,N,0) = faa" w(n,n, N,N). 

For x = 0, m = n, the probability that the smallest observation will never be 
exceeded, equal to the probability that the largest value will always be ex- 
ceeded, is very small, even for moderate sample sizes. 

If n is odd, then m = (n + 1)/2 corresponds to the median of the initial vari- 
able &, and the symmetry relation (1.4) becomes 


(1.7) w(n, (n + 1)/2,N,x) = w(n, (n + 1)/2, N, N — 2). 





SS SE 


1S 
]- 
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It is equally probable that the median of the n past observations is surpassed 
xz or N — « times in N future trials. 


2. The two asymptotic distributions. If both n and N are large, m may increase 
with n such that the quotient m/n remains constant, and the mth values remain 


near the median. Or, m remains constant such that m << n, and the mth values 
are extremes. 


In the first case, let n = N = 2k — 1, where k is large. Then m = & is the 
rank of the median of the initial distribution. As shown in (1.7), the distribution 
of the number of exceedances over the initial median is symmetrical. To obtain 
the asymptotic distribution we reduce « by writing 
(2.1) c=k+tek 
where z remains in a finite interval. The same reduction may be applied to mth 


values in the neighborhood of the initial median. The distribution of the number 
of exceedances over the initial median is, from (1.3) and (2.1), 


( 2-1 ) 
w(2k — 1, k, 2k — 1, x) = const M+ evi} 


4k — 3 
2k + 2zVWk — 1 
Consider only the factors involving the variate z, then the right side becomes, by 
Stirling’s formula, 


(2k + eVk — 1)'(2k — zWk — 2)! 
(k + eV/k) i(k — 2eVWk — 1)! 
i (2k + 2v/k)*teVvk (2k — 2v/k]*-ev* enV etek 
Combination of the factors with the same powers leads to 
(4k* — ke)" (2 + 2Vb)(k - aay” 
(2k — zWk) (k + zVk) 


(k? — ke’) 
2 mf o z Vk 
y (: - a) (1 ° a7i)(? - J) 


(=i) ani) 


Since k and ~+/k are large, and z is small, all factors lead to exponential functions 
whence 





9 


| - 2 c-2-2|= | - | 
exp 5 +24 5 + 5 z exp 9 
and finally, 


(2.2) lim w(2k — 1, k, 2k — 1,2) = const &*". 
k=oo 
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The number of exceedances over the initial median, m = k, in a large sample of 
size 2k — 1 in 2k — 1 future trials is normally distributed with mean, median, 
mode, and variance equal to i:. Therefore the probabilities (2.2) may be called 
the distribution of normal exceedances. 

In the second case where N and n are large, and m and x are small, a distribu- 
tion analogous to the Poisson distribution will be obtained. To indicate that 
N and n are large, they are written NV and n. The probability 


ss (x + m — 1)!n!N!(N + —-—2—--m)! 
(m — 1)!2!(n — m)!(N — x)!(N + n)! 


obtained from (1.3) becomes, by use of the Stirling formula, 


r+m-—l1 n" N* 
w(n, m, N, x) = ( * ) (N + n)"* 


= w(n,n —m+1,N,N — 2). 


w(n, m, N 





(2.3) 


If n = N, the preceding formula becomes 


(2.4) w(n, m, n, x) = (? + 7 7 ‘Nar = w(n, n—m+ 1, n,n — x). 


This probability that the mth largest (or smallest) value will be exceeded x times 

(or n — «x times) in v future trials is independent of n. Since m is small compared 

to n, the probabilities (2.4) may be called the distribution of rare exceedances. 
For x = 0, we obtain the probability 


w(n, m, n, 0) = (3)” = w(n, n—-m + 1, nN, n) 
that the largest (or smallest) mth extreme value is never (or always) exceeded. 
For m = 1, and n = N, the probability 
(2.5) w(n, 1, , x) = (3)°" = w(n, n,n, n — 2) 
that the largest (or smallest) value is exceeded x times (orn — x times) is a 
geometric series. 


To obtain the moments of the distribution of rare exceedances (2.4) we con- 
struct its generating function 


_ayey (rtm — 1\(eV 
From the well known expression for the negative binomial follows 


rym(, 2)” 
(2.6) G(t) = (3) (1 °) , 


whence, by the usual procedure 
(2.7) 


The mean number of exceedances over the mth value from above in the dis- 





dis- 


Probability Ww lx) 


DISTRIBUTION OF EXCEEDANCES 251 


tribution of rare exceedances is m itself. The second derivative of (2.6) fort = 0 
leads to the variance 


(2.8) o = 2m 


which is the double of the variance in the Poisson distribution. This difference 
is easily explained: If we apply the Poisson law to the exceedances, we have to 
know the mean number of exceedances. In our case we only know one observed 


number of exceedances. Consequently the variance must be larger than in the 
Poisson case. 


GRAPH 1 


The distribution of rare exceedances. 


ost ($8) 





Number of exceedances X. 


The variance for the distribution (2.2) of the normal exceedances was 
(N + 1)/2, whereas the variance (2.8) for the distribution of rare exceedances, 
2m, is much smaller since m is small compared to N. This interesting relation 
will be generalized in paragraph 3. 


For m increasing, the distributions (2.4) spread as shown in graph 1. The dis- 
tributions have two modes 


(2.9) %, = m—2;%=m-—1 
except for m = 1, where the probability diminishes with x. The distributions 


(2.4) are similar to the Poisson distribution for integer m. However, for this 
distribution the modes are m — 1 and m. 
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The similarity between the two distributions may also be seen from their 
behavior for large m. In this case, the Poisson distribution for the standardized 
variate y = (x — m)/o converges toward a normal distribution. The same 
holds for the distribution of rare exceedances. For the proof consider the standard- 
ized variate 


(2.10) y = (x — m)/vV/2m. 
Its moment generating function G,(t) becomes, from (2.6), 
G, (2) = (Qet!V2m wo e2t!/2m) —™ 
The usual development leads to the second member 
t,t 2t 4 “jy 
(2+ +o —~-4, +e: 
= t —3/2 ¥ 


we finally obtain 





If we neglect the factors O(m™*”), 


(2.11) G,(t) = e”” 


which is the normal generating function. Thus the distribution of rare exceed- 
ances converges toward normalcy in the same way as the Poisson distribution. 


3. Moments. We return to the general distribution (1.3). For the calculation 
of the moments, the hypergeometric series F(a, 8, y, 1) defined by 


a(a + 1) B(B + 1) 
1-2 v(y¥+ 1) 

is used. The x + lst member of this series is 
_ala+1)---(atxz—1) B(6+1)---@+2- 1) 
oe i x! Vy¥+1)-°--(¥+t2-—1) 


On the other hand, the x + 1st member of the distribution w(x) may be written, 
from (1.3), after changing the signs, 


(3.1) F(a, 6,11) = 1455 4 re 


a] m(m + 1) +++ (m +2-D 


> * + ") x! 
(3.3) m 
sscceecijiagsle meer te SD = 9 SIE isi 
(m—-n—N)\(m—n—N+4+1)---(m—-n-—-N+2-1)° 








This is the general member (3.2) of the hypergeometric series, if we write 
g yperg 


(3.4) a=m;B= —N;y=m-—n-—-N. 


wo oe 


yn. 


on 
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Therefore the probability w(n, m, N, x) is the x + 1st member in the development 
of 


ni(N + n — m)! 
(N + n)\(n — m)! 


Since the sum of the probabilities w(x) must be unity, we obtain 


F(m, —N,m — n — N, 1). 


» . ‘ (N+ n)! (n—m)! 
i —N — cae se a % f 
(3.5) F(m, ,m—n—N,1) i Ws a an 


This relation will be used for the calculation of the factorial moments Z,) of 
order k which are, from (3.3.), 


n'\(N +n —  m)! 

(n — m)'(N + n)! = 

_ ___ WR + 1) --- 7-34 1)m(m + 1) --- (m+ 2— 1) 

(ce —k)"(N+n-—m)\(N+n-—-m—1)---(N4+n—-m—24+1) 


The first member in the sum is 


tun = 


(3.6) 


) —) Seer: 2 — ‘ 
3.7) o(1) = N(N 1) (N k + 1)m(m + 1) (m +k + 1) 


The second member is 
(N — k)(m + k) 
1”(N +n — m— k) 


Generally, each successive member is obtained from the preceding one by the 
same rules as the successive members of the hypergeometric series (3.1). Con- 
sequently, from (3.6), 


. _ ni(N+n-— m)! ( (N — k)(m + k) ) 
88). a= Gaps al OU + Taam wt): 
The sum in the brackets is the hypergeometric series 


Fim + k, —(N — k), (m —- n — N+), 1). 


If we replace, in (3.5), m by m + k, N by N — k,n by n + k,we obtain for the 
sum in (3.8) 


¢(2) = (1) 


Fim+k, — (N —k),m—-n-—N+Kh,1) 
(3.9) i” (N + n)!(n — m)! 
~ n+tk(Ntn—m—k) 


Introduction of (3.9) and (3.7) into (3.8) leads to the factorial moments 





(310) gy = MOD (m+ kk — VN — 1) WA KTD 
0. Lin = (n + 1)(n + 2) +--+ (n+ k) 
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and to the recurrent relation 


, - k — 1 N —_ k 1 = 
(3.10’) Lk) = mph aa etD L[k—-1] - 


If n and WN are both of the same order of magnitude, and large compared to k, 
the expression (3.10) simplifies to 


(3.10’’) Ey = m(m + 1) --- (m+ k — 1). 


GRAPH 2 


Averages of numbers of exceedances. 


For k = 1 we obtain the mean number of exceedances Z,, over the mth largest 
value in N future trials 


. - m 

(3.11) im = N a4 

This expression is identical with the classical formula = N(1 — F,,) in the 
Bernoulli distribution (1.1), since the mean of 1 — F, obtained from (1.2) is 
m/(n + 1). In both distributions the means need not be integers. The mean 
number of exceedances over the smallest value is n times the mean number of 
exceedances over the largest value. If N = n + 1, we have Z,, = m, and the same 
holds if n and WN are large. If n is odd, and m = (n + 1)/2, the mean number of 
exceedances over the median of n observations is N/2. The means Z,, are traced 
against m in Graph 2 forn = N = 9, andn = N = 10. 
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The mean number ,,z of exceedances over the mth value from below is related 
to lun by 
(3.12) in + wk = N. 


The variances o%, and ,,c° of the number of exceedances over the mth values 
from above and below become, from (3.10), 


a mN (1 4 (m+ 1)(N—1) _ mN ). 


v-t = 








n+1 n+2 n+1 
The choice of a common denominator leads, after trivial calculations, to 
(3.13) of = MN— m+ DWtat)D _ oy 
: . (n + 1)°(n + 2) _— 


The variances increase with N and diminish strongly with increasing n. The 


variance is maximum for m = (n + 1)/2, i.e. for the median observation where 
it becomes 


N (N 1 


The variances of the number of exceedances over the largest and the smallest 
value are 


(3.13) = NN Tat) og 


~ (n+ 1P(n + 2) 
The quotient of the variances of the median and of the extremes is 


(3.14) O(n43)/2 _ m+ 1)’ a O(n+1)/2 


Consequently the variance of the median is about n/4 times larger than the 
variance of the extremes. In other words, the extremes are more reliable than the 
median, and this quality increases with the sample size. This is a generalization 
of the relation obtained in paragraph 2. Such a behavior seems singular. How- 
ever, it also holds for the uniform distribution, and for the distribution (1.2) 
of the frequencies [1]. 


In Bernoulli’s case, the variance a> is, after replacing 1 — F, by m/(n + 1), 
= Nm @aomtD 
. (n+1) (n+1) ’ 
whence, from (3.13), 


cag Ntntlse 
m B n+2 B- 


The variance in our case is larger than in Bernoulli’s case, since we do not assume 
the knowledge of the probability F,, which is required for the Bernoulli distribu- 
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tion. For N = n + 3, the variance becomes twice the variance of the Bernoullj 
distribution. This is a generalization of formula (2.8). 


4. The mode and the median. We ask for the most probable number Z of 
exceedances over the previous mth largest among n observations in N future 
trials. If a mode exists, it must be an integer. Since the distribution w(x) decreases 
(or increases) with x for m = 1 (or m = n) we only consider 


(4.1) 2imen-—-1. 


The mode is obtained from the inequalities 


(4.2) win, m,N,x — 1) S w(n, m, N, x) 2 w(n, m, N, x + 1) 


which lead, from (1.5) to 


N+1 


n—1- 





(4.3) (m — 1) —— 1< % < (m - 1) 


The length of the interval is unity, as for the Bernoulli distribution. 
There are several cases where two modes exist. 
a) Let the number of future trials N be such that 


(4.4) N =k(n—-1)-1 
where k is a positive integer. Then the modes are, from (4.3) 
(4.5) - Bay = k(m — 1) — 1; %eq = k(m — 1). 


b) The modes (4.5) also hold if m and N are large compared to unity, and if 
N = k’n, where k’ is again an integer. 

c) If n is odd, the median of the initial variate has the rank m = (n + 1)/2. 
If, at the same time, N is odd, there are two modes, namely 


(4.6) Fay = (N — 1)/2; Ze = (NW + 1)/2. 


In the case N = n, the two modes ¥q) = m — 1, and Z@) = m differ by unity 
from the modes valid in the two previous cases. 

In the case n = N, and m # (n + 1)/2, only one mode exists. To find its 
location, consider first the case that n = N is even, and m S n/2. Then the 
upper limit in (4.3) is 

2 , 1 
Since the interval has unit length, the mode is = m — 1. If m > (n + 1)/2, 
the lower limit is 


[m — 2] + —. (m — 1) > [m — 1]. 


The case that n = N is odd is treated in the same way, and leads to the follow- 


Mi 


of 
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mt 


ing result: The most probable numbers of exceedances over the mth value in 
N = n future trials are 


== m— 1 form S n/2;% = mform > (n/2) + 1, 


if n = N is even, 
(4.7) 


lA 


= m— 1lform S (n+ 1)/2;% = m for m = (n + 1)/2, 

if n = N is odd. 
We now consider the median. If the probabilities w(x) are summed up from 
« = 0 onward, there may exist an integer ¢,, such that the probability for at 
most £m exceedances is 3. This is the median number of exceedances. Such a 
number need not exist. Assume, for example, N < n, then the probability 
w(n, 1, N, 0) alone (see (1.6)) surpasses 3, and the number of exceedances over 
the largest and the smallest value do not possess a median. If the median #,, 
exists, it follows from the symmetry (1.4) that N — Z,, — 1 is the median of the 

number of exceedances over the mth value from below. The relation 


(4.8) En + a = N —1 


differs from the corresponding relation (3.12) for the mean. In some special cases, 
the median can be obtained immediately. Forx = 0,m = 1,n = N, formula (1.6) 
leads to 


w(n,1,,0) = $ = w(n, n, n, n). 


The probability that the largest (or smallest) of n past observations will never (or 
always) be exceeded in n future trials is equal to 3. If n and N are odd, and m = 
(n + 1)/2, the summation of equation (1.7) yields, with the help of (1.3’), 


> w(z) = > wz) =1-— > w(2). 


0 N-=z z+1 


Now the median number of exceedances & is such that the two sums on the right 
sides are equal to 3. Consequently the median number of exceedances in this 
case ism — l. 

We claim that 


(4.9) in =m-—1 


for all m, provided that n = N. For the proof, consider the probability 
W (n, m, N, x) that the mth largest value is exceeded at most x times in N fu- 
ture trials. This is the sum of the first x + 1 members w(x). Let F(a, 8, y, 1) 
be the sum of the first v members of the hypergeometric series (3.1). Then the 
substitutions (3.4) and v = x + 1 lead to 


(*) 
aie F.ss(m, —N,m — n — N, 1). 


si 
m 


(4.10) W(n, m, N, x) = 
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For the sums of the hypergeometric series F(a, 8, y, 1) the following recurrence 
formula [2] is used. 


(y —B-—al(y -—-B-—at+1)---y¥-—B-D 
(y — a\(y —a+1)::-: (y—1) 


. B+1)---@+e-) 
ae ae 0) oS Fe ~ 2) 


F.(v, ve B — i ee + Vv, 1). 
The substitutions used in (3.4), and v = x + 1 lead to 


(— n)(—n + 1) --- (-n +m — 1) 
(—n — N)\(—-n —N +1) --- (—-n—N4+m-—1) 


(—N)(—N + 1) ---(-N + 2) 
(—n—N)(—n—-N+1):--(—-n—N+22) 


F,(z + 1, -—n, -n -N+2+1,1). 


F,(a, By Vs 1) 


(4.11) = ] 


F,4:1(m, -N,m— n—N,1) 








= j— 


This equation may be written from (4.10) 


Cs) 
(4.12) Wn, m,N,z) =1— SZ pie41,—n,—-n—N 424+ LD). 
N+ ") 

E +1 


For «x = m — 1, and N = n, the equation becomes 


ot 
W(n,m,n,m — 1) =1— + a F,n(m, —n, —2n + m, 1). 
=) 
m 


From (4.10) it follows that the second factor on the right side is equal to the 


left side 
n 
m 


W(n,m,n,m — 1) = ~~ Fp(m, —n, —2n + m, 1). 


oy 
m 
Consequently 


(4.13) W(n, m,n,m — 1) = 3. 


If n = N, the median number mn, of exceedances over the mth largest value is m — 1, 
as stated previously. The means, modes, and medians obtained from the exact 
formulae (3.11), (4.7) and (4.9) are traced in graph (2) for n = N = 9, and 
r= N = 10. 
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5. Probabilities of at least one exceedance. If we sum up the probabilities 
w(x) from zero up to a certain zx (or from a certain x up to N), we obtain the 
probabilities W(x) (or P(x)) for at most (or at least) « exceedances over the 
mth past value in N future trials 


(5.1) W(x) = . w(z); Pa = > w(z) 


where 
W(x) + P(x — 1) = 1; Wie — 1) + P(x) = 1. 
The boundary conditions are 
W(0) = w(0); W(N) = 1; PO) = 1; P(N) = w(N). 


From the symmetry (1.4) it follows that the probability for the mth value from 
above to be exceeded at most x times is equal to the probability for the mth 
value from below to be exceeded at least N — x times. 

From (5.1) and (1.3) it follows for m = 1 (and m = n) that the probabilities 
for the largest (or smallest) among n observations to be exceeded at most once 
in n future trials converges toward 3/4 (or zero), respectively. If n is large, the 
probability that the largest value will be exceeded at most x times in n future 
trials is, by virtue of (2.5), 


(5.2) W(n, 1, n, x) =l1- (3)° = P(n, n,n,n — x) 


independent of n. 
Consider now the probability that the mth largest value will be exceeded at 
least once in N future trials 
_ n! (N + n—™m)! 
(n—m)! (N-+n)! 
= W(n,n—m+1,N,N — 1). 


(5.3) P(n, m, N, 1) = 1 


If N and n are large, and m is small, this expression becomes 





P(a, m,N, 1) = 1 - (—" 4) = W(n,n—-m+1,N,N — 1) 
For m = 1 and n = N, the probability is 3, independent of the size of n. 

The least number of exceedances over the smallest value for given probabili- 
ties P, called the tolerance limit, has been derived by 8. S. Wilks [3]. A related 
problem is the following: How many trials N have to be made in order that there 
is a given probability a for the mth largest value to be exceeded at least once? By 
virtue of (5.3) we obtain N from 


(5.4) n'i(N +n — m)! - 
, (n — m)\(N + n)! 


Loe 
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ee 
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For the largest value m = 1, this equation leads to 
as. 8 
n l—a 





(5.5) 


Of course, N/n increases with a. If n is large, and m remains small, equation 
(5.4) leads, in first approximation, to 


y 


(56) , on ag md 
5.6) : (1 — a) 1 


The quotients N/n as function of @ are traced in graph (3). The quotient is 
plotted vertically against 1/(1 — a) plotted horizontally, both in logarithmic 
scales. The abscissa shows the probability a. The curve for m = 1 is exact. The 
corresponding curves for the penultimate and the two preceding values 
(m = 2,3, 4) are obtained from the approximation (5.6). The graph reads in the 
following way: The probability that the largest, or second, or third, or fourth 
value from above are exceeded at least once in 100n, or 9n, or 3.6n, or 2.2n fu- 
ture trials isa = .99. Inversely, in 4n future trials the probability that the larg- 
est, or the second, or the third, or fourth extreme value is exceeded at least once 
is a = 0.80, or 0.96, or 0.992, or 0.9984, respectively. 

In a similar way we calculate the probabilities that the largest (and penulti- 
mate) among n observations is exceeded at least twice in N future trials. Let 
a be this probability. Then we have for the largest value 


1 — ay = w(n, 1, N, 0) + w(n, 1, N, 1) 


n N 
“soe (1 ‘see i) 


For n sufficiently large, the expression simplifies to 


N 2 
(+1) 
5.7) lanes te Milne 


(5.7 a ON. * 

—— 3 

nm 
The probability a, as function of N/n is also traced in Graph (3) and designated 
by m = 1, x = 2. Finally, for m = 2 the probability a, for the penultimate value 
to be exceeded at least twice is obtained for large n by 


N,.\ 
+. ; +1) 
i—— 


a 


(5.8) 


This probability a2 is also traced in Graph 3 and designated by m = 2, x = 2. 


If we fix the probabilities a; , the graph shows the number of future trials cor- 
responding to 1 and 2 exceedances over the largest, the penultimate, and the 
two preceding observations. 
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6. Applications. In 50% of all cases, the largest (or smallest) of n past observa- 
tions will not (or always) be exceeded in N = n future trials. The mean number 
of exceedances is the mean in the Bernoulli distribution. The variance is largest 
for the median, and smallest for the extremes, and this superiority of the extremes 
increases with the sample size. 

If the previous, and the future sample sizes both are large and equal, the dis- 
tribution of the number of exceedances over the median observation is normal 
with mean and variance of the order n/2, whereas the distribution of the ex- 
ceedances over the mth extremes (the law of rare exceedances), similar to the 
Poisson distribution, has the mean m, and the variance 2m, m being small com- 
pared to the sample size. Elementary calculations lead to the setting of sample 
sizes N corresponding to given probabilities for 1 or 2 exceedances over the past 
largest and penultimate observation. 

These methods may be of interest for forecasting floods if, instead of the size 
of the flood, we are interested only in the frequency. The same procedure may 
also be applied to other meteorological phenomena such as droughts, the extreme 
temperatures (the killing frost), the largest precipitations, etc., and permits to 
forecast the number of cases surpassing a given severity within the next N years. 
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ON THE ASYMPTOTIC DISTRIBUTION OF THE SUM OF POWERS 
OF UNIT FREQUENCY DIFFERENCES! 


By Braprorp F. KimMBatu 


New York State Department of Public Service 


1. Summary. Since the “unit”? frequency differences (see (2.2) below) are 
dependent, the usual methods for establishing the normal character of the 
asymptotic distribution of the sum of random variables fail. 

However, the essential character of the distribution is disclosed by the integral 
functional relationship (3.6). From this it is possible to show that for large 
samples the distribution approximates “stability” in the normal sense ([2] and 
Lemma 2). 

Using the condition that the third logarithmic derivative of the characteristic 
function is uniformly bounded for all n on a neighborhood of t = 0 one can 
prove that the asymptotic distribution exists and is normal. 


2. Introduction. Consider a one dimensional statistical universe characterized 
by a cumulative frequency function (cdf) F(x) which is continuous. Consider 
an ordered random sample 2; of size N such that 


(2.1) Xi S Lin, t7=1toN —1. 


Consider frequency differences u; defined by 


uy = F(a), Unnr = 1 — F (zy), 


(2.2) , 

Uiar = F(tigi) — F(x,), t~#=1toN —1. 
Thus 
(2.3) > u = 1, 


N+1 


and the formal integral of the probability density function (pdf) of the u; taken 
over the complete sample space of z; can be written as 


(2.4) N! | du; du2 +++ dui dun+1 = dun+1 = |, 


where ua is any u; which it is found convenient to omit, and the region of integra- 
tion is the N-fold Euclidean space bounded by the coordinate hyperplanes 


u; = 0, i#h, 4=1,2,---N+1, 
and the hyperplane 
(2.5) Uy + Ug bee Hb Une + Unga toes H+ Ung = 1. 
(See [1]). 


1 This is the second paper in connection with the subject announced in Abstract No. 9, 
Annals of Math. Stat., Vol. 17 (1946), p. 502; and Abstract No. 331, Bull. Am. Math. Soc., 
Vol. 52 (1946), p. 827. For first paper, see [1]. 
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Consider a test function ys defined by 
(2.6) yx = Dur, p> MEN+1, 
M 


where p is a real positive number, J is an integer less than or equal to N + 1 
and such that if M < N + 1 the wu; which are to be omitted may be arbitrarily 
selected, but the subscripts indicating the order relation (2.2) are for the present 
retained. 

Consider the case where N is odd and M is even, and set 


(2.7) N =2n+1, M = 2m. 
Divide the set of N + 1 frequency differences u; defined by (2.2) into two 


subsets such that each subset contains n + 1 differences of which exactly m are 
included in the test function (2.6). Now let N become infinite over odd numbers 


Ni, Ne, ---. In other words the sample size is to increase without limit. For 
each sample size N; in such a sequence let M; be an even number such that 
(2.8) M;<N;+1 


and such that the ratio M,/N; is controlled for large values of N by 
(2.9) lim. M,/N; = constant c, e<¢4 1. 
As above for each step in the sequence the set of N; + 1 frequency differences 
u; is divided into two subsets of n, + 1 frequencies each with 
(2.10) N; = 2n; + 1, M; = 2m;, 
such that m; frequencies of each subset are included in the test function 
(2.11) Yu; = Dd, U?. 
Now we note that for a random sample of size N taken from the above universe, 
the characteristic function Gy(t; yw) may be defined by 


(2.12) Gr(t; yu) = N! | e’™ du; dug +++ duy 


taken over region in Euclidean space of N dimensions as indicated for the 
integral (2.4), taking index h equal to N + 1. 


3. Proof of integral relationship—Lemma 1. For simplicity of notation drop 
subscripts from M;, NV; ,n; and m;. We separate the test function yy into two 
parts ¥m and ym such that 


(3.1) yu = Ym + Ym = Leuk + Dou?, om = m' = M/2 


where the m frequency differences u; in ym are those included in first subset and 
those contained in ym are those of the original M frequencies included in the 
second subset (see (2.10) and (2.11)). 
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The formal integral defining Gy(t; ya) may be written 


‘9 


(3.2) Gy(t; yu) = T(2Qn + 2) / ef du +++ dttess / ef" dase s+ Clans 
Ro R 


L 


where 


R. = 2n + 1 dimensional Euclidean space bounded by coordinate hyperplanes 
and plane >oon41 us = 1, 

R, = n dimensional Euclidean space bounded by the coordinate hyperplanes 
and the plane 


Uns + Unss + Per + Unq = 1 — UW, 


Wo % + te t+ o°> + Man 


Now introduce the transformation to u; 


(3.3) 


(3.4) u(1 — w) = u, t=n+2,n+ 3, ---,2n+1,2n+ 2. 
Thus we have 


DD Uj 


n+1 


and the n wu; involved in the integration are bounded above by the hyperplane 
>. u’ = 1. The Jacobian is (1 — w)”. 
Similarly under transformation 


1, 


vw = uU;, += 1,2,---,n+1, 
3.0 
~_ ye = 1. 
n+1 
Let v;, 7 = 1, 2, --- n and w replace the remaining variables of integration. 


Thus the region of integration of these v; is v; > 0 with the hyperplane>., v; = 1 
furnishing the upper bound. The Jacobian of the transformation is w”. 

The regions of integration of these new variables u; and v; are seen to be 
independent of each other and of w. Noting effect of above transformations on 
Ym and ym, the integral (3.2) will be found to reduce to the following form: 


T'(2n + 2) 


(3.6) Gy(t; yx) = Tm + 1) 


where 


1 
[ w'(1 — w)" Gp (tw; ymGr(t(L — w)?; ym) dw, 
0 


N = 2n +1, M = 2m. 


Lemma 1. This functional relationship holds for ali values of N and M subject 
to the condition that N be an odd integer and M an even integer. One may note that a 
similar integral functional relationship will hold for any partition (non) of the 
N — 1 free frequency differences such that 

m+n = N —1, ™m + m, = M, 


with corresponding changes in the Gamma functions which precede the integral. 
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In order to find out what happens when N becomes large the partially normal- 
ized test function zy is introduced. This is defined by 
(3.7) zu = (yw — 9u)(N + 1)’?/VM, 
where (cf. [1], formula (3.1)) 
. — MT(N + 1I)T(p+ I 
8 = E = eg 
(3 ) Ym (ym) T(v +i + DP) 
I have referred to zy as a partially normalized variable since 
E(zmu) = 0, 
lim E(u) = T2Qp + 1) — (p+ 1) — ep lp + 1), 


No 





(3.9) 


where this limit can be shown to be greater than zero for 


p ~ 1, S<es i, 
(3.10) 
p=1, &<e< li. 


Recalling the separation of the test function into two parts (see (3.1)) we 
define 7m and J» by 


mI (n + 1)T(p + 1) 


(3.11) Ym — Ym' = Tin+1 + p) 





with 
M = 2m, N = 2n+1. 
From Stirling’s formula it can then be shown that 
(3.12) (N + 1)°9u/VM = (27/v/2)2[(n + 1)Gn/V ml] + 0(1), 


where 0(1) goes to zero as N and M become infinite subject to the condition 
(2.9). Thus if we define z,, and z,,, by 


(3.13) 2m = (Ym — Gm)(n + :1)?/Vm, 2m = (Ym — Fmr)(n + 1)?/V/m, 
since 

YM = Ym + Ym 
and 


(N + 1)?/-VM = (2?/v/2)(n + 1)?/Vm, 
it follows that 


(3.14) zu = (2?/+/2) (2m + 2m’) + 0(1). 


Hence if we denote the characteristic function of the distribution of the 


on 


the 
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partially normalized test function zy by Gw(t; za) and proceed to develop an 
integral functional relationship similar to (3.6), one arrives at 
wen TO +2 ¢ i 
G; t; Zz a to(l Seen | n 1 ii n 2 .\P/ 2: 1 
— in(t; Zu) = € Pama 4 wa — w)" Glew)”, V/2; zm] 


- Gy [t2?(1 — w)?/V2; 2m] dw 
with 
N =2n+1, M = 2m. 
4. Resulting functional relationship when N becomes large. The second 


lemma shows that the functional equation satisfied by the characteristic function 
of a normal distribution is approximated when N is large. Suppose we now set 


(4.1) w = (1 + s)/2, 1 — w = (1 — s)/2, dw = ds/2. 
Substituting in (3.15) we have 
e™ rian + 2) 6 


(4.2) Gy = Pipa ei) Ly (1 — s°)"Galt(L + s)?/vV/2; em] 
G,(t(1 — s)?/+/2; Zn. 
Set 
(4.3) H(t, s) = Galt( + 8)”/W2; 2m|Galt(l — 8)’//2; 2m. 
Then 
(4.4) H, = @iGatp(1 + 8)”"/V/2 — GaGntp(1 — 8)” V2. 
Using law of mean write 
(4.5) H(t, s) = H(t, 0) + sH,{t, h(s)}, 0 <|A(s)| <s. 


Substituting in (4.2) we have 


-itey  _ I'(2n + 2) _ [ le 
(4.6) é Gy = H(t, 0) + 2-T2(n + 1) ‘ H, lt, h(s)|(1 $) 8 ds. 

With E(z,) = 0, from the fact that the limiting variance of z, is bounded 
(see (3.9)) it follows that the first derivative of its characteristic function remains 
bounded in any finite interval, for all n ({3], p. 90). Thus 


(4.7) | Gilt; zm) | < A, 0<|t} <D, for all n. 


For case p > 1, by virtue of condition (4.7) H, will remain bounded over 
interval of integration of (4.6) as N becomes infinite. Let B denote such upper 
bound of the absolute value of H,. Then, carrying out the integration 


BYr(2n + 2) 1 


ws : -al _ ... — 
(4.8) absolute value of integral < Tn + 1) i+) 
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for any value of t. This quantity approaches zero as N goes to infinity uniformly 
for t on any finite range. For the case that 0 < p < 1 a similar argument may 
be used by including the factor (1 — s)’* which appears in H, in the integration, 
and placing the upper bound on the absolute value of the factor GiGn. 
Substituting back for H(t, 0) in (4.6) one arrives at 
Lemma 2. The characteristic function G,(t; zm) satisfies the relationship 


(4.9) Gw(t; zu) = [Gn(t/-~/2; zm) + 0(1), N = 2n +1, M = 2m, 
where 0(1) goes to zero with increasing n, uniformly for t on any finite interval 
(4.10) O0<jt| <D. 


The above lemma indicates that if the asymptotic pdf of zm exists, it will be a 
“stable” distribution in the normal sense [2]. In order to set the stage for proving 
the existence of this asymptotic distribution we shall first investigate the third 
logarithmic derivative of G,(t; 2m). 


5. Investigation of third logarithmic derivative. We shall now show that the 
third logarithmic derivative of G is uniformly bounded in some neighborhood of 
t = 0. We first prove that the absolute value of the third derivative of G is 
bounded for all ¢ and n. Now the third derivative will have absolute value less 
than the third absolute moment which I denote by yu;. Using Liapounoff’s 
inequality 


(5.1) us S pops 

one asks whether the fourth moment py remains finite as 2 and m become infinite. 
Computation of the fourth moment about the mean appears to be somewhat 

formidable. However it is not so difficult to show that it remains finite with 

increasing m and n. Referring to previous paper ({1] formulas (4.8)-—(4.10)) 

we use quasi-moment generating function go(x) such that , 

(5.2) d'go(0)/dx’ = T(pr + 1), g(0) = 1, 

and it follows that 

(5.3) E(> u?)” = d[go(0))"/dx'T(n + 1)/T(n + 1 + pr), 


and one recalls that 
> a mU(n + 1)T(p + I) 


me ES Tae Tp) 
z = [(n+ 1)’/VmJly — 9). 


The resulting fourth moment of z will be in the form of a fourth degree poly- 
nomial in m whose coefficients are of the type 


(n + 1)” T(n + 1) (n + 1)" T(n + 1) 


“Tm+it+4p) ’ Tar+1+3p)’ ’ 


with 


i al 


h 
) 
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combined with the first moment, with m™ appearing as a factor, By expansion of 
the Gamma function in asymptotic series in (n + 1) it is not difficult to show 
that the coefficient of m* becomes asymptotic like (n + 1)~*, and that the 
coefficient of m* becomes asymptotic like (n + 1)”. It follows that as n and m 
go to infinity with m ~ c(n + 1), that this fourth moment approaches a finite 
limit. Hence one concludes that the third derivative of G has bounded absolute 
value for all n and t. 


Since the absolute value of the first derivative of G is uniformly bounded for 
finite ¢ and all n it follows from the properties of a characteristic function that 
given a positive number K less than unity, it is possible to find a value of t = ty 
greater than zero such that 


(5.4) O<K<|Gilt,2)|<1, O<|t|<o, 


for all n. 

From the above double inequality and the fact that the absolute values of the 
first three derivatives are uniformly bounded it follows that the third logarithmic 
derivative of G is uniformly bounded for all n on the interval 


(5.5) O<|lt|<h. 
6. Proof that the asymptotic distribution of z exists and is normal. Since 


absolute value of G is uniformly bounded away from zero on interval (5.5) one 
can write the functional relation (4.9) as 


(6.1) log Gw(t, za) = 2 log Ga(t/~/2, 2m) + 0(1), 


where 0(1) goes to zero with increasing n uniformly for ¢ on interval (5.5). 
Introduce the notation: 


A(n) equals variance of zm, 
q(t, n) equals third logarithmic derivative of G,(t, zm), 
R(t, N) equals remainder defined by 
(6.2) log Gu(t, zu) = —d(N)/2 + Rit, N). 
Write 
(6.3) log Ga(t/~/2, 2m) = —d(n)t?/4 + q(t0/r/2, n)t?/(12V/2), 0<6 <1. 
Substituting (6.2) and (6.3) in (6.1) 
(6.4) R(t, N) = [A(N) — Am]? /2 + [1/-V2]q(t0/v/2, n)#/6 + o(1). 
By (3.9) 
(6.5) lim A(n) = lim A(N) = positive number X. 
We have proved that there exists an upper bound U such that 
(6.6) lq(t,n)| < U 
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for all nm and for ¢ on interval 
(6.7) O<|tl<h. 


Hence from (6.4) one can reason that given a positive e, a number Np can be 
found such that 


(6.8) | RW, N) | < [1/V/2]U | #/6 | + « 
for all ¢ on (6.7) and for N > Ny. 
By (6.1) 


(6.9) R(t, 2N + 1) = [AN 4+ 1) — A(N)]t’/2 + 2R(t/V/2, N) + o(1). 
Using (6.8) 
| RU/V/2, N) | < [1/-VQ]U | €/(12/2) | + 
Hence for any positive number e, a number N, can be found such that 
| R(t, N)| < (1/2)U | #/6| + 2e+ e, N>N2, 
for all t on (6.7). After k such operations, taking ¢; = € 
(6.10) | R(t, N)| < /2)*°U | #/6| + 2* —le N>Nz. 
Thus given a positive number d one can determine k such that 
(1/2)*’Ut/6 < d/2, 
and ¢ such that 
ae < d/2, 
and therefore a number N;.; such that 
(6.11) | R(t, N) | <d, N > New 


for all ¢ on interval (6.7). 
It follows that Gy/(t, zr) converges uniformly to exp. (—At’/2) on interval (6.7). 
Convergence of Gy(t, zy) for a value t = t, outside the interval (6.7) may be 
proved by choosing integer / such that 


(6.12) 0 <|4 \/(V2)* <b, 
and taking 
tg = t,/(+/2)". 
Recalling that the functional relation (4.9) holds for all finite ¢, this can be 
applied k times, thus building up 4; to. 
It follows from the continuity theorem that the distribution function of zn 
converges to the normal distribution function. 


7. Statement of theorem proved. The proof given above has involved the 
restriction that N be odd and M even (see (2.7)). This restriction is required 
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for the integral relationship (3.6). However, if N were even one could take 
m = N/2 and m = m — 1 and deal with, G,, and G,, in the integrand. Also 
if M were odd, one could take m = (M + 1)/2, m, = m — 1, and deal with 
Gr (t, mo) and G,,(t, m) in the integrand. This would of course carry with it 
corresponding changes in the Gamma functions which precede the integral. 
As long as we require that 


N=nm+m™+41, M=m+m, 
lim M/N = lim mo/no = lim m/m = c > 0, 
the arguments used in arriving at the asymptotic relations (3.15) and (4.9) 
will apply. Hence the theorem: 


2 ° ° oa ° ° ° 
THEOREM . For a one dimensional statistical universe whose cdf is continuous, 
consider the function of the unit frequency differences u; 


(7.1) y - u? 


taken from an ordered random'sample of size n (see (2.2)) where p is any real 
positive number, and m is any positive integer less than or equal ton + 1. The 
selection of which m unit frequencies are to be included is arbitrary. Then with 


72) g = By) = ree 

consider the partially normalized variable 

(7.3) z= a (y — 9). 

If n goes to infinity, with m becoming infinite so that 

(7.4) lim m/n = c > 0, 

then the asymptotic cumulative distribution of z exists and is normal, with 
(7.5) lim E(’) = Tp + 1) — (p + 1) — ep T°(p + 1), 


except in the trivial case p = 1,m = n + 1, in which case z = 0, and in the case 
p=l,c=1. 
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EFFECT OF LINEAR TRUNCATION ON A MULTINORMAL POPULATION! 
By Z. W. Brrnpavum’ 
University of Washington 
1. Introduction. in admission to educational institutions, personnel selection, 


testing of materials, and other practical situations, the following mathematical 
model is frequently encountered: A (/: + /)-dimensional random variable (X,, 


X2,°-*,X~,¥1,¥2,--- , Yi) = (&, Y) is considered, with a joint probability- 
distribution assumed to be non-singular multi-normal. The Y;, Yo, --- , Y: are 
scores in admission tests, the X, , X., --- , Xx, scores in achievement tests. The 


admission tests are administered to all individuals in the (X, Y) population 
to decide on admission or rejection, and (usually at some later time) the achieve- 
ment tests are administered to those admitted. A set of weights a; > 0,7 = 
1, 2, --- , l is used to define a composite admission test score U =)>5-1a,Y; 
and a “cutting score’”’ 7 is chosen so that an individual is admitted if U > 7, 
and rejected if U < +r. We will refer to this procedure as linear truncation of 
(X, Y) in Y to the set U > 7. 

A linear truncation in Y clearly will change the absolute distribution of X, 
except in the case of independence. In this paper a study is made of the absolute 
distribution of X after linear truncation in Y in the case k = 1; in particular, 
the possibility is investigated of choosing the a; and 7 in such a way that the 
distribution of X after truncation has certain desirable properties. The case 
k > 1 leads to a considerable diversity of problems which are being studied and, 
it 1s hoped, will be the subject of a separate paper. 

Throughout this paper it will be assumed that all the parameters of (X, Y), 
that is the expectations, variances and covariances before truncation, are 
Known. In practical situations it often happens that only the parameters of 
Y,,Y2,--- , Y: before truncation are known, while the first and second moments 
mnvolving X,, X2, --:, Xx are only known for the joint distribution after 
truncation. It can be shown [1] that in such situations the expectations, variances 
and covariances of (X, Y) before truncation can always be reconstructed if 
(X, Y) has a multinormal distribution. 

In the simplest case k = 1 = 1 the probability-density of the original bi- 
normal random variable (X, Y) may be, without loss of generality, assumed 
equal to 


—(x2—2px ¥+ ¥2)/2(1—p2) 


1 
(1.1) f(X, Y, p) = VI *° 
By truncating this distribution in Y to the set Y > 7 one obtains the probability- 
density 
(1.2) g(X,Y3;0,7) = (7)f(X, V3), forY >r, 
0, for Y < 7, 





1 Presented to the Institute of Mathematical Statistics on June 18, 1949. 
2 Research done under the sponsorship of the Office of Naval Research. 
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where 


1 - a 
(1.3) Y(r) = aa | e*” dt. 


e 5. . 
For further use we introduce the abbreviations 





















(1.4) g(r) = —72/2 


/ 2x ° , 





(1.5) 





We also note the inequalities 
(1.6) 
and 


(1.7) Xr) < Vitr—r 









derived in [2] and [3], respectively.’ 
Before proceeding to the more-dimensional case, we will study some properties 
of the marginal probability-distribution of X after truncation to Y > 7+ 





(1.8) e(Xse,2) = | o(X, ¥s0,2) a¥. 





2. The moments of ¢(X; pp, 7). In this section all mathematical expectations 
are computed for the absolute distribution of X after truncation of (X, Y) to 
Y>r. 2 
We have *} 





r a 7 a pX =s 
o(X;3p,7) =v (r)o(X)yp (“—=,). Z. 
= 


and hence 


+0 
B(X") = [ X*¢(X;p, 2) aX 





on i ka - 
= y (7) [ =e OS e” dSdX 
o Var V 2m J(r—pxy/+/i—p? 





{ r +00 
ie a ee tT — pX \ | 
== ( é a ci X . (S) 
¥ (7) \ g\-A) ¥ vi-- 


| 
\gem—oo 


+2 — lax? T ais pX_ 
+ [ eX) | ae (7-5) 


5 Implicitly, the inequality (1.6) was known already to Laplace, cf. Mécanique Céles e, 
transl. by Bowditch, Boston 1839, Vol. 4, p. 493. 
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‘e" os (FZ ) ax | 


dx” p als pX 
. E (Se dX ) + Govice L, ¥ eH (72 os) ax 


From the identity 


pX X — pt 
(2.0) oie (TEP) = ole (25) 


we obtain 


+00 > -+oo : 
[. X" *9(X)e 7 dX = ¢(r) [. X""o os = dX 


+co 
=Vi-vel) [| (SVi=P + on)" *9(8) as, 








and hence 


(21) E(X") =E (= ) + ore) [SVT =P + 0n)"0(8) a, 


dX 
forn > 1. 
For n = 1 this yields for the expectation of X after truncation 


(2.2) E(X) = pX(r). 
For n = 2 we have from (2.1) 
E(X’) = 1+ p'rXr) = 1 + prE(X), 


and hence for the variance of X after truncation the expression 


(2.3) o(X) = 1+ E(X)[pr — E(X)], 
or 
(2.31) o°(X) = 1 — pA(r)[A(r) — zh. 


From (2.2) we see that E(X) always has the sign of p, as one would expect. 
From (2.3) one finds a lower bound for 7 


E*(X) - 1 

pE(X) — 

From (2.31) and (1.6) one concludes that o(X) < 1 for p ¥ 0, hence the 
variance of X after truncation is always less than the variance before truncation, 
except if p = 0. 

Similarly one computes from (2.1) the third moment about zero 

E(X*) = E(X)B — p(l — 7°)| 
and obtains for the third moment about the expectation 


(2.5) E(X — E(X)) = E(X)p{[\(r) — r][2(r) — 7] — U. 


(2.4) > 











ds, 
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Numerical computation indicates that the quantity in braces is always >0, 
which would mean that the skewness of X after truncation has the same sign as 
E(X) and p. No analytic proof of this statement has been obtained. 


3. Determination of 7 for given expectation or quantile of X after truncation; 
dependence of this z on p. Let it be required to determine 7 so that the expectation 
of X after truncation assumes a given value m. It follows immediately from 
(2.2) that this 7 is obtained by solving the equation 


(3.1) Mr) =" 
p 


for 7, which can be done with the aid of a table’ of \(r). 
Another problem which occurs in applications consists in determining 7 so 


that, for given 0 < a < land X,, the a-quantile for X after truncation assumes 
the value X, , that is so that 


Xa Xa oe 
(32) | “ols, ax =0@ | f sXe) av ax =a 
Let 


(3.21) P(s,t;p) = mie | / titel fs 
— 8 t 


denote the volume of the probability solid Z = f(X, Y; p) above the quadrant 
X > s, Y >t. Then (3.2) may be written in the form 


_ PXa, 759) _ 


1 
V(r) ” 


or 
(3.3) (1 — a)p(7) = P(X, 73 p), 
and this equation can be solved for 7 by trial with the aid of tables of y(7) and 
Pearson’s tables [4] of P(s, ¢; ), 
Lemma 1. For fixed expectation of X after truncation E(X) = m, the solution 7(p) 


of (3.1) is a strictly decreasing function of the absolute value of p for0 < |p| <1. 
Proor: Differentiating m = pX(r) with regard to p one obtains 


O = A(r) + pr(r) & 
dp 
and, in view of the identity 
Ar) = A(r)[A(7) — ZI, 
the expression 
dr 1 
A “= —-———_.. 
we a” ea 


4A table of 1/A(r) is, for example, given in Karl Pearson, Tables for Statisticians and 
Biometricians, Part II, 1931, pp. 11-15. 
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From (3.4) and (1.6) we see that 


sign > —sign p 
: dp ’ 
which proves our lemma. 
Lema 2. For fixed a, Xa, the solution r = 1(p) of (3.3) is a strictly decreasing 
function of | p| for0 < |p| <1. 
PROOF: Ditierentiating (3.3) with — to p one obtains 


OP dr oP 
—(1 ~~» iets = — .*t- e’ 
and hence 
_ oP 
dr Op 
6 — aa 
(3.6) ; 


oP , 
a, + (1 — ae) 
y 
From (3.21) one easily verifies that 
Sa ta — 97 f te" at, 
p (Xap) //1=9' =m! 


and therefore 


OP(Xa , 73 p) >0 


(3.7) ap 


One also computes 
OP(Xa,T3p) _ (7) Xa — pt 
Or = ew y ‘1 pee po ? 
so that the denominator of the right hand expression in (3.6) becomes 
Xa — pT =). 
Sw ae 
ots) | : a #7 


In view of (3.3) this is equal to 
Pix... tie) ee pT ) | 
oto | ¥(r) “* Vip 


= Xr) | Pe, 73 p) — Wr) (= 7 =) 


> f(Xaner)/Vi=p? __ 
= Xr) -f ern / mtv @ 8 dU aY 
(Xa—pY)/V/i—p? 








“ee at n(Y) aY. 
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If p > 0, then pY > pz in the interval of integration r << Y < , hence 
xX a ae Y xX ° ° ooo ° 
aa < Aa , therefore the integrand h(Y) is positive, and so is the 
denominator of (3.6). Similarly one sees that if p < 0 the integrand A(Y) is 
negative for 7 <_Y < o and the denominator of (3.6) is negative. In view of 
(3.7) we conclude 


sign dr = —sign p for p ~ 0. 
dp 


4. Linear truncation of (X, Y1, Y2, - - - , Y:) to the set >.}_, a;Y; > 7 for 
given expectation or quantile of X, minimizing the rejected part of the population. 
Let (X, Y1, Y2,-°°+ , Y:) bean (J + 1)-dimensional non-singular normal random 
variable with all expectations, variances and covariances known. We wish to 
choose a; , @2, -** , a; and 7 so that by setting 


Zz 
(4.1) U = 21 a;¥; 
j7=1 


and performing the linear truncation to the set U > 7 we obtain for the expecta- 
tion of X after truncation a pre-assigned value m, and that this is achieved with 
the least waste of the original population, that is so that for the non-truncated 
probability-distribution the probability P(>\j-1 a;Y; < 7) is minimum. 
Without loss of generality we may assume that, before truncation, we have 


(4.21) E(X) = E(Y;) = --- = E(Y) = 0, 
(4.22) o(X) = 1, 

and thus 

(4.3) E(U) = 0. 


Furthermore, the a; and 7 can always be multiplied by a constant, without 
changing the set of truncation, so that we have 


(4.4) o(U) = 1. 


THEOREM 1. To truncate (X, Y1, Ye, --- , Yi) linearly in Y,, Y2, «++ , Yi so 
that the expectation of X after truncation has the given value m and that the probability 
of the rejected part of the original population is minimum, it is necessary and 
sufficient (1) to determine a, , G2, «++ , a: 80 that the absolute value of the correlation- 
coefficient p(X, U) becomes maximum under the condition (4.4), and (2) for U 
determined by these a; , @2, °** , a: and for p = p(X, U) to solve equation (8.1) for r. 

The proof of this theorem follows immediately from the first paragraph of 
section 3 and Lemma 1. 

Using the second paragraph of section 3 and Lemma 2, one equally easily 
arrives at the following theorem: 

THEOREM 2. To truncate (X, Y1, Y2, --: , Yu) linearly in Y,, Y2, +++, Yi 
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so that the a-quantile of X after truncation has the given value Xq and that the 
probability of the rejected part of the original population is minimum, it is necessary 
and sufficient to satisfy (1) in Theorem 1 and then to solve equation (8.3). 

The problem of satisfying requirement (1) of Theorems 1 and 2 can be solved 
effectively by a method due to Hotelling [5]. It may be worth noting that this 
method yields two sets of constants, a, , ds, +--+ ,@ and —a,, —@,°--, —a 
both maximizing | p(X, U)! but leading to values of p(X, U) with opposite 
signs. Nevertheless the choice between a; , a2, --- ,@)and —da,, —@,-°---, —a, 
and the determination of 7 are unique for any given m, since (3.1) has a solution 
for 7 only if sign p = sign m. 


expectation of Y after truncation, minimizing the variance of X after truncation. 
It may be of practical interest to choose a; , a, --- , @: and 7 so that, with 
the notations and under the assumptions of section 4, the expectation of XY 
after truncation becomes equal to a given number m, and the variance after 
truncation is minimum. 

THEOREM 3. To truncate (X, Y,, Ye, --- , Yi) linearly in Y,, Yo, -+- , Yi so 
that the expectation after truncation has the given value m and that, under this 
condition, the variance of X after truncation becomes as small as possible, it is 
necessary and sufficient to satisfy the conditions (1) and (2) of Theorem 1. 

The proof of this theorem follows from section 3 and the following lemma: 

Lemna 3. For fixed E(X) = m, the variance o°(X) after truncation is a strictly 
decreasing function of the absolute value of p for0 < |p! <1. 

Proor: According to (2.3) we have 


5. Linear truncation of (X, Y;, Y2,---, Yu.) to theset aie a;Y; > 7 for given 


o (X) = 1+ m(pr — m). 
Differentiating with regard to p and using (3.4) we have 
do"(X) _ ¢ “a .) el =. 
For 7 < 0 this clearly is <0. For 7 > 0 inequality (1.7) yields 
Mr) — 7] —1 S 374 4 2? — 37° — 2) 
< 3[r(2 + 7) — 387° — 2] = 7(1 — 7) — 1, 
and this is < Ofor 7 > 0. Together with (1.6), this proves that 


t[A(7) — 7] om 1 


- <0 
ACF) — 7 


for all 7, and hence according to (3.1) 


sign ———* = —sign m = —sign p. 
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It may be conjectured that thesign of do”(X)/dp isopposite to that of p alsoin the 
case when o (X) is the variance after truncation minimized under condition 
(3.3). This would lead to a theorem stating that the same choice of a; , a2, --+ , a 
and + which according to Theorem 2 makes the a-quantile after truncation 
equal to the given number X, and minimizes the rejected part of the original 
population, will also minimize the variance of X after truncation. 
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NOTES 
This section is devoted to brief research and expository articles and other short items. 
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EXTENSION OF A THEOREM OF BLACKWELL! 


By E. W. BarankIn 
University of California, Berkeley 


1. Introduction. In [1] ($1) the author has announced, as bearing on the 
results there, that Blackwell’s method [2] of uniformly improving the variance 
of an unbiased estimate by taking the conditional expectation with respect to a 
sufficient statistic, is in fact similarly effective on every absolute central moment 
of order s = 1. Our purpose here is to establish this. In addition, the equality 
condition (null improvement of the moment) is presented in terms of a primitive 
property of the estimate. The asserted uniform diminution of the s-th moments 
for a family W of distributions is, as in the case s = 2, a twice removed con- 
sequence of the fundamental fact for a single distribution that the absolute s-th 
power of the conditional expectation of a measurable function is almost every- 
where (a.e.) not greater than the conditional expectation of the absolute s-th 
power of the function. This is the substance of the theorem below. The second 
corollary then states the result for unbiased estimates. 


2. Preliminaries. Let 2 be a space of points x; §, a o-field of subsets of Q; 
and yu, a probability measure on §. Let ¢ be a function on 2 onto a space I of 
points 7; X" a o-field of subsets of '; and T—a sub-s-field of §¥—the inverse of 
Tt" under t. A set in X” will be denoted by A", where A is its inverse unde t. 
Let v denote the measure on &" defined by »(A*) = p(A). 

If f is a real-valued,’ §-measurable, u-integrable function on 2, we denote by 
u(f | -) the conditional expectation of f with respect to t. Corresponding to any 
particular function h on T (as, for example, E(j! -)) we define the function 
h* on Q by 

h*(xz) = h(7), t(z) = +. 


The qualification “essentially” prefixing a statement will mean that with the 
possible exception of a set of points of measure 0, that statement holds true. 

The following two simple lemmas enable us to present the conditions for 
equality, in the results below, in terms of the elementary characteristics of the 
function f. 

1 This note was prepared under O. N. R. contract. 

? With no changes in this note, and only minor changes in [1], the results we have set 
forth concerning unbiased estimation pertain as well to complex-valued functions. 
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LemMaA 1. A necessary and sufficient condition that sgn f(z) = sgn E*(f | x) 
a.e. (u) 2s that sgn f be essentially a function of t. 
The necessity of the condition is clear. To prove sufficiency, let f’ be a function 
on 2 which is a.e. equal to f, and such that sgn f’ is an (unqualified) function of t. 
Now if sgn f’(z) = sgn E*(f | x) does not hold a.e. (#), then there is a T-set, A, 
of positive measure, such that, for example, for x e A, f’(z) > 0 while E*(f|z) < 
0. We then have the contradiction 


0<ffanfta=f Bsl)aso. 
é A 
LEMMA 2. A necessary and sufficient condition that f(x) E*(f |x) a.e. (n) 
is that f be essentially a function of t. 
Again the necessity is obvious. To show sufficiency, let f’ be a function on 2 
which is a.e. equal to f, and is an (unqualified) function of t. Define h on T by 
h(r) =f"), ta) =. 


Then h* = f’, and we have 


[ su = [ 5° um = i h dv, AeT. 


But this implies that h(r) = E(f| 7) a.e. (v), and therefore f(z) = E*(f | zx) 
a.e. (u), as was to be shown. 


IIA 


3. Results. For a proof of the Holder inequality that we use in establishing the 
following theorem, we refer the reader to [3] (p. 233). 
THEeorEM.’ Let s = 1. Then for almost all (u)z, 


(1) | E*(f| x) |’ s B*(\f |" | x4 
Equality holds a.e. 


(i) for s = 1, af and only 2 sgn f ts essentially a function of t; 
(ii) for s > 1, tf and only if f is essentially a function of t. 
Proor: Consider first the case s = 1. Let 

S = {xeQ| E*(f | xz) > 0}, 


S’=2-S. 
Then, for any A ef, 


| |E*(f|-)|du= [ E*(f\-) du — [. E*(f| +) du 


= | jdu— [ fdus / if | dp = | E*(\fi\-+) dp. 
SA S’A A A 


>The proof we present here was suggested by the referee, and is much shorter than 
our own. 


‘ For s = 1 this inequality was used by Doob in “Regularity properties of certain families 
of chance variables’, Trans. Amer. Math. Soc., Vol. 47 (1940), pp. 455-486 (Theorem 0.2). 
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Since A is arbitrary, we have the result (1) with s = 1. It is clear that the equality 
sign holds a.e. (u) if and only if, except possibly for a set of measure 0, f is positive 
on S and non-positive on S’; that is, if and only if sgn f(x) = sgn E*(f| 2) ae. 
(u). Applying Lemma 1, we have the equality condition as stated in the theorem. 

Now let s > 1. To establish (1) it will suffice, by virtue of what has already 
been proved for s = 1, to consider f = 0 a.e. (u). We may then argue as follows. 
Unless (1) holds a.e., there is a T-set, R, of positive measure, and numbers 
a >ib = 0 such that for x e R, 


[E*(f | x)]’ 2 a, 
and 
E*(f* |x) Sb. 


But then, with an application of the Hélder inequality wé meet a contradiction. 


For, 
alu(R)) Ss if E*(f| -) in} = {f sam} 


= a 


< dlu(R)]’, 


which contradicts a > b. Thus, (1) is proved in general. 

If f(z) = E*(f | x) ae. (u), it is readily proved by a direct argument that then 
equality holds in (1) a.e. (u). Conversely, suppose equality in (1) holds a.e. 
Then we have, in fact, a.e., 


(2) | E*(f |x) | = E*(\f || 2), 

and 

(3) [E*(\f || x) = E*( Ff |" | 2). 

For brevity, denote the function E*(| f || -) by v. Since f vanishes at almost all 
points where v vanishes, we may. write | f | = w-v, where 


ome 1, v(x) = 0, 
— ‘ fla) \/vlz), (a) > 0. 


(If v vanishes almost everywhere, we are through.) For any &-measurable, 
real-valued function, u, on 2, we have 


(4) [u-vdu= | u-v-wdg, 
Q Q 


1 


ON THEOREM OF BLACKWELL 283 


when either of these integrals exists (ef. [4], p. 50, eq. (15)). Similarly, and 
taking account of the equality assumption (3) we have 


(5) / u:-vdu= [ u-v-w dp 
2 2 


In particular, consider the two functions 


u(x) = { _— ee 


0, v(x) = 0, 
and 
_ fi/fo(z)’, —-o(x) > 0, 
=e v(x) = 0. 
If 


So = {x €Q| v(x) > 0}, 


it is seen that uv, taken in conjunction with (4), and uw. taken in conjunction with 
(5), bring out 


/ wdp = [ w du = w(So). 
So So 


From this it follows (e.g., by the equality condition attending the Hélder in- 
equality) that w(x) = 1 ae. in So. Hence | f(x) | = v(x) ae. in Q. Therefore, 
by (2), | f(x) | = | E*(f| x) | a.e. But (2) also implies, as already shown, sgn f(z) = 
sgn E*(f|2) a.e. Thus, finally, we have f(x) = E*(f| 2) a.e. Now apply 
Lemma 2, and the proof of the theorem is complete. 

CoroLLARY 1. Let s > 1, and let go denote the expectation of f. Then 


(6) | |E*(f\-) — gj dus / lf — go} dp. 
2 2 


Equality holds 

(i) for s = 1, af and only af sgn |f — go] is essentially a function of t; 

(ii) for s > 1, af and only if f is essentially a function of t. 

This result. expresses the domination over the s-th absolute central moment 
of the conditional expectation of f by the corresponding moment of f itself. It 
follows almost. immediately from the theorem when we write (6) in the form 


- I, |E*(f — go|-) |'du S [eas — go i | +) du. 


Thus, from the theorem we know that the integrand of the left-hand side of (7) 
is a.e. S the integrand on the right. Hence (7) holds. Equality in (7) holds then 
if and only if the integrands are a.e. equal. The theorem therefore directly 
provides the equality conditions as stated. 

Let W = {ye , 0 €O} bea family of probability measures on §; and /, a sufficient 
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statistic for W (cf. [5], p. 232, §5). Let f be an unbiased estimate of the function 
g on ®. For each py € W, the conditional expectation, E,(f | -), of f with respect 
to ¢ is defined. Since conditional expectations are fully determined by conditional 
probabilities (although, in general, not as usual integrals. Cf. [4], pp. 48, 49; 
also [5], p. 230) it follows from the sufficiency of ¢ that there exists a function 
E(f | -), on T, with E,(f | 7) = E(f| 7) a.e. (ve) for each 6 €O - E*(f| -) is again 
an unbiased estimate of g, and we have 

Corouiary 2. Let t be a sufficient statistic for the family W = {ue, 0€O}; 
and f, an unbiased estimate of g. For s = 1, and each 6 €®, 


[ BG) - oO due s f LF 9 I da. 


Equality holds 
(i) for s = 1, af and only if sgn [f — g(0)] is essentially (us) a function of t; 
(ii) for s > 1, of and only if f is essentially (us) a function of t. 
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NOTE ON CONSISTENT ESTIMATES OF THE LINEAR STRUCTURAL 
RELATION BETWEEN TWO VARIABLES! 


By Exuizasetru L. Scott 


University of California, Berkeley 


1. Introduction. The purpose of this note is to present another case in which 
the structural linear relation between two observable random variables may be 
consistently estimated. Of the recent papers on this subject I wish to mention the 
paper by Wald [1], which contains a history of the work done on the problem, 
and the more recent paper by Housner and Brennan [2]. Also relevant is the 
important result due to Reiersgl [3], [4]. 

2. Statement of problem. Assume that the two observable random variables 
x and y have the structure 


1 Paper prepared with partial support of the Office of Naval Research. 
The results summarized were presented in a discussion held at the Cleveland Meeting 
of the Institute of Mathematical Statistics, December, 1948 
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bo 
go 
or 


E+ u 
a+ BE + v, 


where a and § are unknown parameters to be estimated, and &, uw and v are 
completely independent random variables. The latter two variables, inter- 
preted as the random errors of measurement, are assumed to vary normally 
about zero with unknown variances oj and o3 , respectively. 

An increasing number » of completely independent pairs of simultaneous 
values of x and y are to be observed 


(ax 
(1) 4 
y 


(2) (x; , Hs); t= 1,2, >-* ,% 


so that each pair (x; , y;) corresponds to a value é; of the unobservable random 
variable which is independent of the value £; of — corresponding to any other 
pair (%;, ys), 1 # 7. 

It is well known that if the distribution of — is normal then the parameters 
a, 8, 0, and o2 are unidentifiable. Reiersgl proved [4] that these parameters are 
identifiable in all other cases. Wald and Housner and Brennan found consistent 
estimates of these parameters assuming that, although the particular values of 
£ are not known exactly, a certain amount of knowledge concerning the values 
of — is available. The present note gives a method for obtaining a consistent 
estimate of 8, which is the key to the problem of estimating the four parameters, 
for the case where it is known that a specified central moment of the distribution 
of £ exists and differs from that of the normal distribution. 

Since work on this subject continues, the present brief note deals particularly 
with the simplest case, when one of the odd central moments of é exists and 
differs from the ‘‘normal”’ value, zero. It will be observed that the hypotheses 
made here are of entirely different character from those adopted by other writers. 
The present note postulates knowledge concerning a moment of the distribution 
of £, whereas the papers quoted postulate some knowledge of the particular 
values assumed by £. The method adopted was suggested by a remark made by 
Neyman [5] in 1936. 

3. Preliminary theorems. Let 


n 
dL tis y. = 
- 


and let b be an arbitrary real number. 


THEOREM 1: If yu; , the third central moment of &, exists then the arithmetic 
mean 


M: 


i 
es 


(3) t= 


ot= 
& 


—_ 


v 


(4) F.a(b) =~ Sly: — y. — Wes — 2) 


converges in probability to 


(5) (8 — b)*us . 
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Proor: Simple algebra gives 


F.a(6) = @ — PLD 8)" 


+36 — 02 DG — ew — » — bw — WI 

(6) : 
+ 36-0) = & — Ebi — 0. — bu — WP 

t=] 


4 : > ls — v. — blu; — uw). 
nN i=1 

It is obvious that further expansion will express F,,,,(b) in terms of averages of 
the type 4 

(7) — 2 ePuiv’, 

N i=1 

with p+q+pr <3. Since all the terms over which each average is taken are 
completely independent, follow the same law and possess finite expectations, the 
familiar theorem of Khintchine assures that, as 7 is increased, each average (7) 
tends in probability to its expectation. Using Slutsky’s theorem (see Cramér [6], 
p. 255), we conclude that F,,1(b) tends in probability to the limit obtained by 


replacing each average in the expansion (6) by its expectation and then letting 
n — o. The computations are easy and give 


(8) lim pF, x(b) = (8 - b)*us. 


Q.E.D. 

Let {X,} denote a sequence of observable random variables (multivariate or 
not) such that the distribution function of X, depends on the parameters 
6; with a; < 6; <b; ,7 = 1,2, --- , s. Furthermore, let \ denote a real variable 
and {¢n(X, , A)} a sequence of functions of the arguments X, and A defined for 
all possible values of X, and for all values of \ within the limits a, S A Sh. 

THEOREM 2: If the sequence of functions toa(X ny d)} has the following properties: 

(i) whatever be the true values 0; , 02, °-- , 0 of the parameters 0; within the 
limits a; < 0; <b;,t = 1,2,---, S, as nis dened, the sequence {on(Xn, A)} 
tends in probability to a function F (A, 6;) of arguments d and 6; only. 

(ii) whatever be 5 > 0, there exist in (a, , by) two numbers d, and dz , each differing 
from 0; by less than 6 and such that the product f(A, , 61) fQc, 6;) is negative, 

(iii) for every n and every possible value x, of X,, , the function n(%n , A) ts con- 
tinuous with respect tox’ fora, SA Sh, 

then whatever be e > 0 and n > 0 there existsa number N.,, suchthat forn > Ne, 
the probability that the equation $,(X,, ) = 0 has a root between 6; — e and 
6; + ¢ exceeds 1 — 7». 

Proor: Let « > 0 and 7 > O be two arbitrarily small numbers. Let \; and 
2 be two numbers such that d; € (a; , b;) and | 6; — A; | < e,7 = 1, 2, and such 
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that f(Ar , 01) <0 < f(As , 61). Select N.,, so large that forn > N,,, the probability 
of having simultaneously 


(9) ldn(Xn, 9) oT fA; rm) | < 3 \fAi, 63) for 7 — i, 2 


differs from unity by less than 7. It is clear that if the inequalities (9) are satisfied 
for any particular value z, of X, , then 


(10) dn(Xn > Ai) < 0 < on(Ln ) 2) 


and the continuity of ¢,(a, , A) for \ € (a; , b;) implies that there exists a number 
(<n) between A; and dz such that ¢,(x, , A(an)) = 0. Obviously | 6; — X(Ln) |<. 
Thus, whatever be e, 7 > 0, there exists a number N,,, such that the probability 
that ¢a(X» , \) has a root in the interval (6; — «, 6; + «€) exceeds 1 — 7 pro- 
vided n > N,,, . This proves Theorem 2. 

Theorem 2 is treated as a convenient lemma on which to base the proof of the 
existence of a consistent estimate of the parameter 8 in (1). It is obvious, how- 
ever, that this Theorem has an independent interest of its own. 

4, Consistent estimates of the structural parameter §. Referring to the general 
set-up of the problem of estimating the structural parameter 6 in (1) and using 
the notation (2) and (3), we prove the following theorems. 

THEOREM 3: If the third central moment yu; of = exists and differs from zero, then 
the equation 


(11) F,a() = + : i ei ~ a ne 


has a root 6 which is a consistent estimate of B. 

Proor: According to Theorem 1, whatever be b and 8, the stochastic limit of 
F,.,1(b) is (8 — b)3y3 and changes its sign as b passes through the value 8. Theorem 
2 implies then that whatever be e, 7 > 0, there exists a number N,,, such that 
forn > N,,, the probability that at least one of the roots of (11) will lie within 
8 — eand® + eis greater than 1 — 7. This proves the theorem. 

Generally, let » denote the m™ central moment of £. 

THEOREM 4: Jf the distribution of — has moments up to and including order 2m + 1 
and if at least one of the first m odd central moments px+: differs from zero, k = 
1,2, --- , m, then the equation 


) Fam) =) Do lye — y. — Blas — 2)P" = 0 


has a root 6 which is a consistent estimate of B. 
Proor: The proof of Theorem 4 exactly follows the lines of that of Theorem 3. 
Using (1), (2) and (3), we write 


2m+1 


Fn .m(b) — Zz Concell ae b)* 


k=0 


(13) . : 
3 D (& — &)*fv; — v, — bus — uP") - 


} 
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It is easily seen that, as n — ©, F,,m(b) tends in probability to the limit 


(14) Fam(b) > 8 — b)*y(8 — b), 
where ¥(8 — b) is a linear combination of even powers of (6 — b) with at least 
one coefficient different from zero. It follows that the stochastic limit of F,,n(b) 
changes its sign as b passes through 6 and the proof is completed by reference to 
Theorem 2. 

Note that the stochastic limit of the first derivative of F,,.(b), evaluated at 
b = 8, is zero, which is unfortunate. Furthermore, the order of contact of F,,, m(b) 
at b = B increases with the order of the first odd central moment of — which 
differs from zero. Therefore, the precision of estimating 8 may be expected to be 
better when the low odd central moments are not zero. Without narrowing the 
generality of the case considered, it is difficult to make an evaulation of the pre- 
cision of the estimates obtained. Thus, for example, the familiar method of 
evaluating the asymptotic variance requires the knowledge of higher moments of 
£ than those considered here. For similar reasons, it is thus far impossible to 
speak of the relative efficiency of the estimates found. For this purpose it would 
be necessary to determine first the measure of the precision of the best estimate 
whose consistency persists independently of the distribution of £ provided only 
that at least one odd central moment differs from zero. ' 

Once the consistent estimate 6 of 8 is obtained, there is no particular difficulty 
in obtaining consistent estimates of the other parameters. 

J. Neyman has pointed out [7] that Theorem 2 may be used as the basis for 
a very elementary proof of the consistency of maximum likelihood estimates. 
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ON MULTINOMIAL DISTRIBUTIONS WITH LIMITED FREEDOM: 
A STOCHASTIC GENESIS OF PARETO’S AND PEARSON’S CURVES 


By Maria CASsTELLANI 


University of Kansas City 


1. A multinomial law with limited freedom: Distribution functions of statis- 
tical equilibrium. We intend to consider here a convenient model of statistical 
mechanics, which by generalization of an approach used by Cantelli [1], shall 
give us either Pareto’s or Pearson’s curves. Let us imagine that N elements 
(N > i: — 3) have to be randomly distributed in a set of / continuous intervals 
t; (¢ = 1, 2, --- , k) in R, ; the “‘a priori” probability associated with t; , being 
pi , for > ‘-1 Pi = 1. Assuming that the elements have no preferences, they move 
freely under the law of chance taking different configurations (n; , nz, --- , Nx), 
with probabilities P(n; , nz, --- , mx), n; being the number of elements placed 
in t; and >-‘., n; = N. The random variable Y(t) representing the total number 
of configurations (m,, m2, -°-- , %), therefore obeys a multinomial law with 
ik — 1 degrees of freedom, viz.: 

. 


k 
(1) Pitnyno.---.ng) = NIT] + Di’; dX n; = N, a pi = 1. 

We shall proceed to admit that the elements are not free, but that they have 
preference in the choice of a suitable interval. This fact we associate with the 
assumption that some forces of attraction are made to play in each interval. 
For the sake of simplicity we shall consider that there are two independent 
forces, say u(t) and v(t), whose convenient potential functions are respectively 
f(t) and g(t), where 
(2) T= wy; FH = (0. 

These potential functions we may, for instance, associate with the significance 
of a certain quanta whose total is to be distributed among the elements and whose 
significance must be established by consideration of the particular statistical 
experiment. It is then admissible, at least in our first approach, to assume these 
potential quantities to have a total constant magnitude, viz.; >> nf(t) = Mh, 
= nw(t;) = He , where H, and H: are appropriate constants. This condition is 
analogous to the assumption in statistical mechanics of the preservation of 
energy. This analogy enables us to follow classical methods. We shall call our 
method the method of ‘‘intervals of energy.” Let us say that our system reaches 
its canonic state when P¢n, ,.n2,---.n,) IS a Maximum [2]. When this state occurs with 
a probability close to the value one, we may say it is in statistical equilibrium. 
It is well known that Po, .n.,...,n,) reaches its maximum when: 


(3) Piiieuum*9 @ thePunam 6 
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Performing as usual, for example as in [3], we ultimately obtain: 
(4) ni = herr, 


where a, b, and ¢ are arbitrary constants. 

If N is sufficiently large, n;/N may be considered a probability, and precisely 
the probability for Y(t) to score n; times the value ¢; when the canonical state 
is reached. The problem may then be extended assuming that a continuous 
function may interpolate the discrete values m , n2,--- , m,. Puttingy = n/N, 
assuming for the sake of simplicity that f(t) and ¢(t) along with their derivatives 
are continuous functions of ¢, and grouping these constants into a single K, 
formula (4) becomes: 





(5) y = Keo, 
then: 
dlogy _1ldy _ df(t) g(t) _ 
(6) _ oa = —b “dt Cc -— bu(t) cv(é). 


Equation (6) is a generalization of the familiar Pearson differential equation 
which generates his system of curves. It is obvious that (6) may determine a 
large set of frequency curves, depending on the form of f(t) and ¢(t). 

The above analysis may be extended to any number of acting forces provided 
they are less than k — 1 in number. 

2. Stochastic genesis of the Pareto and Pearson curves. We shall next show 
how the Pareto and Pearson curves belong to this family of frequency curves. 

The Pearsonian system of curves is derived by comparing its differential 
equation with (6) to determine in these the most suitable functions for u(t) and 
v(t). Thus, 


- ld 
(7) dy t+a 


a keeaear * OO + 0%. 


Corresponding to the decomposition into partial fractions of the middle term, 
we have two sets of curves. When 


(8: + Bot + Bet’) = B3(t — v1)(t — v2) 


and ¥1, Y2 are real numbers, then 


—bult) = ==> = oe; 
(8) B3(y1 — 2) (t = v1) i—/1 ; . 
, ¥2 a q 
ili: sat snsiatenniA x Taeieiaaiinn ith: ‘anil 
— Bs(y2 —n)(6— v2) §€-— ¥: 


’ 


te 


p and q being suitable grouping constants. 
Under these assumptions two forces are acting in each “class of energy”, 
each one being proportional to the distance of the interval from some origin. 


n; 
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Substituting (8) into (6) and integrating, we obtain corresponding to the first of 
(5), after grouping the exponential constants into A 


y = K(t — 1)" — v2)’; 


where K, 71 , ¥2 , p, g also have the significance of statistical constants according 
to which we obtain Pearson’s curves of Type I or VI. 
When 6; = 0, we have by the same process: 


| pas 


a — Bi/B2 a qi 
B: Bi + Bot p+t 


Hence, by grouping together the statistical constants under K, p, qi, q2, we 
obtain: 


—br(t) = = @, —cv(t) = 


~ 


y = K(p + te, 


which is a Pearson curve of Type III.’ In each class interval two forces are acting; 
one is constant and the other is inversely proportional to the distance of the 
interval from a fixed origin. 

We obtain a Pareto curve when in (8) either p or q is zero. Under the indicated 
assumptions the Pareto income distribution curve appears in a new light. If the 
acting forces are reduced to one, and this one force is inversely proportional to the 
distance of the interval from some origin, the Pareto curve represents a special 
case of the Pearsonian curve. 

In (7), we now consider the decomposition of the Pearson function for the case 
where the denominator does not have real roots. This decomposition may be 
indicated as follows: 





—ttoa _ it B/ 283 | ant 
Bit Bot + Bs Bf (t + Be/28s)” + B:/Bs — 82/483} 
B3{(t + 62/283)” + 61/8; — 83/483} " 
Setting 
283 - 283 ” B3 483 
+ 2 e 
as — w(t) = Pe 





Bs{(t-+ pi)? + g}’ BAG + py +e 





1A. L. Bowley has found in his well-known analysis of food expenditures of urban 
families, that the distribution of weekly family expenditures can be best expressed by a 
Pearson curve of Type III. This is not surprising, since it is exactly a case where we can 
assume the joint effect of a constant factor and another factor acting in inverse proportion 
to the interval (again in the sense of the distance from a suitable origin). The constant fac- 
tor in our case is the human need of food, while the factor acting inversely to the interval, 
can be taken as a response to prices. See [4]. 
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These we may interpret as forces of the Newtonian type. By grouping the 
statistical constants appropriately under K, p; , g, m1 , m2 , m3 , we derive from 
(7) the following equation: 


y = k(t + pr + gy eter reins), 


This is the familiar Pearson curve of Type IV. 
Other distributions of the same family can be easily found by the same method. 
3. The frequency curves and their statistical equilibrium. The conclusive step 
in this analysis is in finding the probability of the most likely configuration. By 
generalizing a process of statistical mechanics first used by Castelnuovo [5], we 


‘ / / / . ff 
assume any configuration (nm, , m2, °°: , mx) slightly different from the most 
probable (nm , m2, --- , m) (the canonical configuration). Setting 
/ ‘ P 
Ni = at ni @= i, 2, °** 5 k), 


we have by conditions (1) and (3): 
k 


> a; = 0, Fatt. = 0, > aiv(ti) = 0, 
i=] i=1 


~ 


(9) ' 
ge | i Pits 


The sum of the values of Pn} jn3,...,n,) Will give us the total probability of scoring 
a n; slightly different from n;. Let us designate by II the total probability of 
having P('\,n3,....n,) Satisfying all above conditions. By following Castelnuovo’s 
method [2], [5], we obtain: 


a 5 , ei 
es ce et i | -3 |. 


i=l Ni 


M- 


li 


We determine all integral sets of n; compatible with (9); and with a condition of 
size 
k “ 
d 
Le < 2. 
i=1 Ni; 


By a well-known process [2], [5] for any 2 


1 k-—5)/2 —u 


uo 
Tl] = —--- D2" du 
II r(! = af U ae. 





9 


This is the familiar Chi-square distribution function with (/ — 3) degrees of 
freedom. By considering up as increasing with V, we can conclude that 


lim [[T ='1. 


N-o 


The state of maximum likelihood has a real significance only if it is almost certain 
that we will obtain either such a state or any one practically equivalent to it. 
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This occurs when the state of maximum probability has little chance to change; 
it is a so-called stationary state or state of statistical equilibrium. It would mean a 
great deal if we could be able to say through how many states the statistical 
phenomena must pass before attaining its equilibrium, or in other words, whether 
the ergodic hypothesis of the kinetic theory of gas applies to certain social or 
economic phenomena. We will not go further into this now; the results obtained 
here must he considered as an initial exploratory step, which does permit us, 
however, to end with the following conclusive statement: 

If N elements, provided N is large enough, are distributed at random in 

k class “intervals of energy’’, it is highly probable that they will approach 

a configuration of statistical equilibrium, a distribution of maximum prob- 

ability. Pareto’s and Pearson’s curves represent special configurations of 

statistical equilibrium in a stochastic system. 
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ON THE COMPLETELY UNBIASSED CHARACTER OF TESTS OF INDE- 
PENDENCE IN MULTIVARIATE NORMAL SYSTEMS 


By R. D. Narain 


Indian Council of Agricultural Research 


1. Introductory. To prove the unbiassed character of likelihood ratio tests 
like the test of significance of the multiple correlation coefficient or Hotelling’s 
T’ test, Daly [1] used the non-null frequency distributions of these test criteria. 
This leads to obvious difficulties when tackling the general regression problem 
and the test of independence of several sets of variates, and Daly [1] has shown 
only their locally unbiassed character. 

This paper demonstrates an approach which does not require an explicit 
knowledge of the frequency distribution of the test criteria and it has been 
possible to prove that the likelihood ratio test for the general regression prob- 
lem and the Wilks’ criterion for independence of sets of variates are completely 
unbiassed. The argument proceeds in a chain, the unbiassedness of the Wilks’ 
criterion following ultimately from the unbiassedness of the t-test. The link up has 
been achieved by working with a chain of conditional distribution densities, a 
principle employed earlier by the author [3], [4] in presenting a unified distribu- 
tion theory of the common statistical coefficients relevant to normal theory. 
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2. The {-test. As the simplest demonstration of the procedure which is appli- 
cable generally, consider the f-test for the significance of the mean of a normal 
population. Let the frequency function of a sample of size n be 


, r\—(n/2) l = f v2 | 
(1) (2rV) exp E ay a xX; — Mm) ¥ 


The region W — w complementary to the critical region w for testing the hy- 
pothesis 
m=0 
is given by 
ef < 8x, 


where k is a positive constant depending on the size of w and 


n 
nt = > Ses 
i=1 


x” — >> (2; — 2)’. 


1=1 


We write 


(2) T(m) 


Il 
ies 
7 
i 
3 > 


kx = 2 9 ” 
eee az | 0 d(x°), 

where 

fx) dx) 


is the frequency function of x° which is distributed independently of 7. To show 
that the test is completely unbiassed is equivalent to showing that 


I(m) < I(O) for all values of m. 


We have 


ww 
ol oe ¢_—(n/2V)(Kx+m)? —(n/2V)(Kxy—m)?) r( 2 I( 2 
_* ie ~“< } £00) dx) 
m 0 


which is positive or negative according as m is negative or positive. Therefore 
I(m) < I(0). 


3. The E? and R? tests. Let the frequency function of » observations of a 
random variate x, be 


= T n p—1l = 
(3) (2xV)*” ‘ exp | - 37 p (2 m 2. B, rs) | I] Gxia- 


i=1 r= 


With the usual notation for partial variates in regression analysis, the critical 
region w based on the likelihood ratio test for the hypothesis 


0 = Bm = B m+ Bee gn tees Bot, m < eo l, 


re 


cal 
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is given by 


9 
+» Lip-(12--+ p—1) 


2 ta® 
fs A ee +>— < a positive constant. 
2. Lip-C2---m=1) 





It can be shown [2], [3] that this ratio can be expressed in the form 


x? 
i- Fa’ 


V¥+> 2 


Tm 


where the frequency function of x° and the z, is 


1 
( 7\—(n/2) —(p—}) /2 ws 
2rV) (x) (" ~ aa ' 
r — 
(4) ™ 


if..,.& a Dk tiie 
- exp | - 3p {x + a (2, a Nr) | (x")* , » d(x’) II dz; 


The hypothesis to be tested then becomes 
0 = 2m = Im = + = Np-1 + 


The region W — w complementary to w is given 


p-l 


’ 
IM 
n 
A 
= 
rad 


where k is a positive constant determined by the size of w. Denote by 
I(np-1 5 Np-2 » *** Nm) the integral of (4) over the region W — w. Differentiating 
I with respect to n,-1, performing the integration with respect to z,_, and 
arguing exactly as in section 2 above we obtain 


I (np > Mp-25 °°" 9 1m) < (0, Np-2 » Np-3 *** Nm)- 


Note that z,-2 is distributed independently of z,_,. Therefore starting with 
Np-1= O in (4) and considering the integration with respect to Zp. first, we 
obtain as before J(0, np-2 --- am) < IO, 0, np-s --- mm) and thus finally 
I(np-1 , Np-2 5 *** Nm) < I(O, 0, --- 0), which proves the completely unbiassed 
character of the E’-test. The test of significance of the multiple correlation 
coefficient with any number of the predicting variates being fixed or random may 
be considered as a corollary to the above. We have only to multiply the frequency 
function (3) by a factor dF representing the frequency function of the random 
predicting variates (which need not be necessarily normal). This does not affect 
either the test criterion or the arguments showing its unbiassedness. The test of 
significance of the multiple correlation coefficient is thus completely unbiassed. 
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4. The general regression problem. Given the distribution, 
(Qr)~*?” at? jn/2 exp [uf = a” 2. (rs a Zz Brn Lit) 
‘(tis — D> Bata) X J] de;,, 
h ‘.* 


(5) ~=1,2,---n, 


rs=m+i,m-+ 2, --- p, 
n>p>m>2>l, 


where the matrix |! xj, || is of rank m. The hypothesis H to be tested is 


r=m+1,m+2,--: p, 


Br» = 0, 
v=Il+1,1+2,-->m. 


The likelihood ratio test gives the critical region defined by 








77, < a positive constant, 
where, with the usual regression notation for partial variates, 


nn 
Ars = 7. Lir.(12---m) U(is-(12---m) ; 
i=l 
r,s=m+1,m+2,--- p, 


n 
’ 
ars = z. Lir.(12-.-1) Vis.(12-.-0) - 
i=1 


Now we note that 


Pp p—\ 
(6) = I] G@-£) =(1-£;) I] a— #, 
r=m+1 r=m+1 
where 
a asad tah ea pas cll 
ta Fa =. aaa 


L ir. (12---l,m4+1,m+2, ---r—1) 


Since the statistic \ is invariant to linear transformations of the random variates 
Lmit,» Lm42, °** , Ly the distribution (5) may be simplified to 


P ; r te 
IL, | evr? exp] - 3} E (ee — Danza)? | Tare |, 
r=m+1 <_< r 1 ; t 
i waz 2 ae n, 
h = 1,2, --- m. 


Denote by I(8p. , Bp-i , °** Bm4i,r) the integral of (7) over the region W — w 


n, 


- WwW 
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complementary to the critical region w, where 8,, in J stands for the entire set 
of parameters 6,,141 , 8,142 , *** Brm . We may first integrate over a subregion of 
W — w over which [[?="., (1 — E>) has a given value. Using identity (6) and 
the result of section 3 it follows immediately that 


I (Bp» ’ Bo-1.0 re ty B m+t.e) < T(0, Bp-1,» ’ Bp-2.1 on Bm+t.e)- 


If Bp» = 0, the distribution of E%, is independent of that of E%-; . Hence, startng 
with 8,» = 0 in (7) and considering the integration for E>_; first, we obtain 


I(0, Bp-1,0 ? Bp-2,0 eo Bm.) < (0, 0, Bp-2.» ere Bm+1.0) . 
Thus finally 
I (Bpv ’ Bp-1.0 on Bm+i.0) < I, 0, so 0), 
which proves the completely unbiassed character of the test. 
5. Test of independence of sets of variates. Consider n observations of q sets 


of random variates distributed in the multivariate normal form 


Const X exp [—3 >> a” iz, (rir — m,)(zie — ms)}] I] aziz, 


(8) i=1,2,---n, 
p= 1,2,---h,at+1ht+2,---b, ht+1,---h,-k, 
n> I,. 


Denote by D; the determinant of the sample dispersion matrix of the j™ set of 
variates and by D(j) the determinant of the dispersion matrix of the first 7 
sets taken together. The Wilks’ statistic used for testing the independence of the 
q sets is given by 


D(q) 
(9) A a => ITN, 
II D; si 
j=1 
where 
_ DG) 7 
i" he = 6’ j = 2,3, q. 


The region W — w complementary to the critical region w is defined by 
A > a positive constant. 


The statistic W is invariant to linear transformations within each set of variates. 
The distribution (8) may therefore without loss of generality be written in the 
form 


1 


q l; ( eat n l;-1 ; p 
(10) II | Il \@rVs) pea exp (-s ,# (xis — 8 Dru r)') Il axa} |. 
j=1 Lrelj_itl 2V; i=l u=0 ur 


? 
i 
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Let B; (@= 2, 3, --- q) stand for the set of constants 
+ 1, byst 2, +--+ ly, 


rT = | Pa 
me u=1,2,---bs4, 
and let 
(11) B, =0 


imply the vanishing of all the constants of the set B; . The q sets of variates 
will be independent if (11) holds for all values of 7 from 2 to qg. Denote by 
I(B, , Bo-1 +++ , +++ , Be) the integral of (10) over the region W — w. Integrating 
first over the sub-region of W — w for which 


q—1 


I]; 


j=2 
has a given value and using the result of section 4, it follows that 
I(B, ? Bow eee Bz) < T(00, By-1 ie B:). 


Also if Bz, = 0, A, is distributed independently of A,-; . Hence starting with 
B, = 0 in (i0) and integrating for \,_; first, we obtain 


IO, By-1 , By-2 Sa Be) < T(0, 0, By-2 ia B2). 


Thus finally, /(B, , By-1 , --- Be) < (0,0, --- 0), which proves the completely 
unbiassed character of the Wilks criterion. 
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ON THE DISTRIBUTION OF THE TWO CLOSEST AMONG A SET OF 
THREE OBSERVATIONS! 


By G. R. Ser 
Iowa State College 


1. Introduction. In this note we obtain the joint distribution of the two closest 
observation x’, x’ (a’ < x’) of the set x , X2 , 43 (41< 22 < 23) when the dis- 
tribution of x; , 22 , 23 is given or can be obtained.” We will assume that in general 
the density function is given by f(x; , x2 , 23) and that it is continuous in the 


1 The results in this paper were presented at a meeting of the Institute of Mathematical 
Statistics in Madison, Wisconsin, September 9, 1948. 

? The author’s attention was drawn to this problem while visiting the National Bureau 
of Standards in the Spring of 1948, by Mr. Julius Lieblein of the Statistical Engineering 
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variables involved. We also find the distributions of certain statistics depending 
on x’ and x’’. We will denote the density and the cumulative distribution function 
of a normal variate with mean zero and unit variance by $(x) and G(z). 


2. Distribution of the two closest. Let x’, x’’ be the two closest among the set 
of 41 , 2 , 23 (41S ®2 < 2). Let PS, , S., --- , S,) denote the probability that 
the events S,; , Ss, --- , S, occur. Let us consider P(x’ < s, x’’ < t), fort < s. 
For s < t, it reduces to P(x” < t) i.e. the marginal cumulative distribution of 
‘". 

Now 
1) P(z' < 8,2" <t) = P(q < 8,22 < t, 4% — 4% < 23 -- 2%) 
1 
+ P(xe < 8,23 < t, 22 — 4% < 43 — %). 


The equalities, here as well as elsewhere, are omitted as the variables admit 
continuous distributions. Let the first and second terms on the right side in 
(1) be denoted by P(A) and P(B) respectively, where A, B denote the events in 
the respective brackets. The event B can be further split up into more ele- 


mentary events whose probabilities can be easily found. (B) can be seen to be 
equivalent to 


mitt 
2 





(a < 25-4, m<2< ‘ ny <n < 2n— 11) 


+(2s-t<2a,<s8, a <{m <e, x2 < 43 < 2X, — X%) 


t 
+ (u<2-4 az <m<s, <n <t), 





— 


We may write (1) in the form of integrals and differentiating under the integral 
sign with respect to ¢ and s we obtain 

aP 20 2s—t vs 
(2) —- = f(s, X3) dx3 + Jt §, t) dx. 
Osot 2t—s — 2 
The right hand side of (2) gives the density function of 2’, x’ at 2’ = 8,2” = t, 
Let f:;(z; , x;) be the density function of x; and x; (¢ > j= 1, 2, 3). Then the 
density function p(z’, x’’) of x’ and x” can be put into the form 
3) p(2’, x”) = f(z’, x’) [Ll — F3(22”" — a’ | am = 2’, m2 = 2”)| 


+ fos(x’, x’) [Fi(2z’ — x" | ae = 2’, rs = x’)I, 


where F(x; | x; = 1, x, = m) represents the cumulative distribution function 
of the conditional density function of x; when x; and 2; are fixed at the values I 
and m respectively. If, before ordering, the three observations are independent 


Laboratory. He understands that Mr. Lieblein has in preparation for submission to the 
Journal of Research of the National Bureau of Standards a paper giving intensive considera- 
tion to the closest pair and otber aspects of samples of three observations. 


i 
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and from the same population having the density function f(x), then (3) with 
the help of 
f(xy , 2, 43) = Of(21) f(ae) f(zxs) 


reduces to 


(4) p(x’, x”) = 6f(a’ fe’) [1 — F(2x” — x’) + F(22’ — x’’)] 
where F(x) = [ f(x) dz. 


3. Joint distribution of (x’’ —x’) and (x’’ — x’)/(x3; — x2). Let Fi(s, ¢) denote 


. io ‘ ; x ” ae 
the cumulative distribution function of u = 2’ — x’ and w = -. Then 
ag“ i 


tt , 
7" " ; ae x’ —2 
(5) Fie, ) = Pla” — 2 <0 ZS <a] 


The range for u is (0, ~) and w varies between 0 and 3, and thus we limit our- 
selves to s varying from 0 to ~, and ¢ varying in (0, $). 

After some manipulation of the probability statement and differentiating with 
respect to s and ¢ under the integral sign, in a manner similar to that of the 
previous section, we obtain the joint density function of u and v, given by 


9 . nc f 

OFy(s,t) 8s ‘ s 
—<—<——__ = — | fl 21, %1 + 8,21 + - adxy 
dsot r — 0 : t 


(6) +- in, at = et + ) ax, | 


fi(s,t) (say). 


| 


4. Applications to normal distributions. Let f(z) in (4) be the density function 
of anormal distribution with mean @ and variance unity, then (6) reduces to 


(7) fy(u, w) = ; Sagtowane 


Further the marginal density of u and w will be given by 


, u /3u 
” po) = v3 Za) -— (S75) | 


8V3 | 0 < w <3, _ respectively. 


xr l—-wt+w’ 





(Y) plw) = 


The distribution of w has been obtained by J. Lieblein in an unpublished 


paper. 
From (2) we can also obtain the joint density function of wu = 2” — a’ and 
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g’ gl’ 


y = ——,-—.- When we integrate this joint density function with respect to u, 
tt 


we Obtain the density function of v = > + -as given by 





.. Bist. /Bly Vatu — 4) 
ws piv) = 6V/24lv/2(v — 4)] E + G( Ai ) 
16 


° 3x 
— 2[ o(x)G (J, +v— 6) az | 


The mean and the variance of the distribution of v are given by 6 and 4 + “= 
T 
respectively. 


It may be remarked that if there is a suspicion that one of the extreme observa- 
tions in a sample of three does not belong to the normal population under con- 
sideration, then the median of the sample is a better estimate than the average of 
the two closest. The efficiency of the latter compared to that of the former is 


about 70%, for the variance of the median in this case is given by 1 + v3 
Tv 


compared to 4 + _ of v, the average of the two closest. The efficiency is here 
Tv 


defined as the ratio of the variances for the two estimates. 


0 a 


ERRATA 


By W. FELLER 


Cornell University 


The author regrets the following inconsequential, but very disturbing, slips 
in his paper “On the Kolmogorov-Smirnov limit theorems for empirical distri- 
butions” (Annals of Math. Stat., Vol. 19 (1948), pp. 177-189): 

(1) In equation (1.4) on p. 178, the exponent —»’z’ should be replaced by 
—2y*z*. The same copying error occurs in the description of Smirnov’s table on 
p. 279. The proof is correct as it stands. 

(2) In the formulation of the continuity-theorem on p. 180 it is claimed that 


u. — f(t) whereas in reality the continuity theorem permits only the conclusion 
that 


k : 
() 5 2d, Ur —> I f(x) dz. 


This slip in formulation in no way affects the proofs since only (*) is used. 
(The assertion that the step functions {£,} converge pointwise is not based on a 
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second application of the continuity theorem, but on the obvious fact that(+) 
implies 


k t 
5 >> g,u, > i q(x)f(a) dz, 
r=1 0 


where the step function {g,} converges uniformily to a continuous monotonic 
q(x)). 

The following corrections apply to the paper, ‘On the normal approximation 
to the binomial distribution” (Annals of Math. Stat., Vol. 16, (1945), pp. 319- 
329). 

(1) Equation (27) gives two variants of an estimate for the error p. The second 
should simply restate the first one in terms of the variable xz; in other words, 
the expression (p> + q°) in the second line of (27) should be replaced by 
pl — px/o)* + g(1 + gz/c)’. 

(2) The estimate p < o °/300 given in (28) is not valid over the entire range 
for which it is claimed. However, the further theory depends only on the fact 
that p = O(c *), and the estimate p < o °/30 is both correct and sufficient for 
our purposes. (Actually, no changes whatever are required in the proofs, since 
(28) is used explicitly only for a range where it is correct as stated). 

(3) On p. 324 it is stated that under the conditions of the main theorem 
(p. 325) k > 4,n — k > 4, whereas in reality the value 3 can occur in extreme 
cases. Fortunately, the assertion is not used anywhere in the proof, and the 
error p is negligible in all cases. 

Accordingly, no changes are required either in the formulation or the proof of 
the theorems. I am indebted to Dr. W. Hoeffding for calling my attention to the 
slips. 

(4) The first minus sign in footnote 5 should be an equality sign and the second 
minus in (70) a plus. 
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ABSTRACTS OF PAPERS 
(Abstracts of papers presented at the Chapel Hill meeting of the Institute, March 17-18, 1950) 


1. A Method of Estimating the Parameters of an Autoregressive Time Series. 
S. G. GuuryE, University of North Carolina. 
The general autoregressive process of the second order is defined by the equations 
Zi = Xe+ m1 ’ 
Xe + anXe + a2X1-2 = &, 


where 2; is the value actually observed at time t, X; the corresponding theoretical value, 
e: the disturbance and 7; the superposed variation. The estimates of a: , az given by Yule’s 
method are biased and inconsistent if 7; is not identically zero, the permanent bias being a 
function of the unknown variance of n; . The present paper proposes a method of estimation 
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which is unaffected by the presence of n; , and seems to be better than any other known 
method; and this conjecture is supported by the results of application to observational and 
artificial series. In this method the estimates a; , a2 are obtained by minimizing 


> ky 


is (N —k—9) lems (xe + i Geer + Og %ene) (Tepe + G1 Lege—-i + Oe Teyh~2) y 


where n is some number small in comparison with N (which is the number of observa- 
tions). In the above expression the usual approximation of substituting (V — k — 2)r; for 
D*=F2et14x may be made for computational convenience. The method has been used for 
fitting autoregressive processes to the series of annual averages of Wolfer’s sunspot num- 
bers and that of Myrdal’s Swedish cost of living index numbers. The method is applicable 
to higher order processes. 


2. Most Powerful Rank Order Tests. (Preliminary Report). Wassity Horrrpina, 
University of North Carolina. 


Let Xn, --: , Xini , °++ , Xe1, °** » Xknx be random variables with a joint probability 
function P(S) and let P{Xig = Xia} = Oifg Ah (i = 1,--- , k). Let Ho be a hypothesis 
which implies that P(S) is invariant under all permutations of Xi, ---, Xin; @ =1,--°, 
k). Let ri; (fj = 1, --+ , ni) be the ranks of Xj , --- , Xin; . Under Hy the M = Iin,! rank 
permutations R = (ri, -*+ , Ting » *** , kL, *** » Tknk) have the same probability P(R) = 
M-. A test which depends only on the permutations R is called a rank order test (R.O.T.). 
A R.O.T. of size m/M which is most powerful (M.P.) against a simple alternative, P,(S), 
is determined by m permutations R for which P;(R) takes on its m largest values. 

For example, let the pairs (X:, Yi ,--- , (Xn, Yn) be independent and identically 
distributed. Let Ho state that X; , Y; are independent, and let Hi(p) be the hypothesis that 
X;, Y; have a bivariate normal distribution with correlation p. We may assume that 
X, < +++ < X, and consider the ranks r; of the Y’s only. A R.O.T. which is uniformly M. P. 
against all Hi(p) with p > 0 does not exist except for small n. The M.P.R.O.T. against small 
p > Ois determined by the largest values of 27_, (EZ;)(EZ,;), where EZ; is the expectation 
of the z-th order statistic in a sample of n from a standard normal distribuion. The M. P. 
unbiased R.O.T. against small values of |p|is based on the statistic 2; 2; (#ZiZ;)(EZ,iZ-;). 
The M.P. R.O.T. against p close to 1 is obtained by expanding the probability of (r1 , --- , 
rn) in powers of {(1 — p)/(1 + p)}#/2. 

3. The Comparison of Percentages in Matched Samples. Witu1AM G. CocHran, 
Johns Hopkins University. 


In this paper the familiar x? test for comparing the percentages of successes in a number of 
independent samples is extended to the situation in which each member of any sample is 
matched in some way with a member of every other sample. This problem has been encoun- 
tered in the fields of psychology, pharmacology, bacteriology, and sample survey design 
A solution has been given by McNemar (1949) when there are only two samples. 

In the more general case, the data are arranged in a two-way table with r rows and ¢ 
columns, in which each column represents a sample and each row a matched group. The 
test criterion proposed is 


cle — Dz(T; — T)2 


e(Su;) — (Su?) ’ 





where 7’; is the total number of successes in the j** sample and 7, the total number of suc- 
cesses in the 7** row. If the true probability of success is the same in all samples, the limit- 


e 
: 
' 
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ing distribution of Q, when the number of rows is large, is the x? distribution with (¢ — 1) 
degrees of freedom. The relation between this test and the ordinary x? test, valid when 
samples are independent, is discussed. 

In small samples the exact distribution of Q can be constructed by regarding the row 
totals as fixed, and by assuming that on the null hypothesis every column is equally likely 
to obtain one of the successes in a row. This exact distribution is worked out for eight 
examples in order to test the accuracy of the x? approximation to the distribution of @ in 
small samples. The number of samples ranged from c = 3 to c = 5. The average error in the 
estimation of a significance probability was about 14 per cent in the neighborhood of the 
5 per cent level and about 21 per cent in the neighborhood of the 1 per cent level. Correction 
for continuity did not improve the accuracy of the approximation, although it is recom- 
mended when there are cnly two samples. Another approximation, obtained by scoring each 
success as ‘‘1”’ and each failure as “0” and performing an analysis of variance on the data, 
was also investigated. The F-test, corrected for continuity, performed about as well as the 
x? approximation (uncorrected), but is slightly more laborious. 


The problem of subdividing x? into components for more detailed tests is briefly dis- 
cussed. 


4. A Method of Estimating Components of Variance in Disproportionate Num- 
bers. H. L. Lucas, North Carolina State College. 


By including sufficient effects in the forward solution of the Abbreviated Doolittle 
method, components of variance may be estimated from disproportionate data. The pro- 
cedure is very systematic, and thus, is adaptable to routine computational work. The 
computations will be described, and the utility of the method briefly discussed. 


5. On the Theory of Unbiased Tests of Simple Statistical Hypotheses Specifying 


the Values of Two Parameters. (Preliminary Report). Stantey L. Isaacson, 
Columbia University. 


In the Neyman-Pearson theory of testing simple hypotheses, in the one-parameter case, 
a locally best unbiased region is called “‘type A.’’ It is obtained by maximizing the curvature 
of the power curve at the point @ = @ specified by the hypothesis, subject to the conditions 
of size and unbiasedness. For the two-parameter case, Neyman and Pearson considered 
‘“‘type C’”’ regions (Stat. Res. Mem., vol. 2 (1938), p. 36). The definition of these regions 
requires one to choose in advance a family of ellipses of constant power in an infinitesimal 
neighborhood of the point (6, , 62) = (6? , 62) specified by the hypothesis. The natural 
generalization of a ‘‘type A” region is a ‘‘type D” region, which maximizes the Gaussian 
curvature of the power surface at (6? , 02), subject to the conditions of size and unbiased- 
ness. This definition does not require one to choose a family of ellipses in advance. This 
approach leads to a new problem in the calculus of variations. A sufficient condition is 
obtained which plays the role of the Neyman-Pearson fundamental lemma in the “type A” 


case. An illustrative example is given. (Prepared under sponsorship of the Office of Naval 
Research.) 


6. A Note on Orthogonal Arrays. Ras CHanpra Boss, University of North 
Carolina. 


Consider a matrix A = (a@;;) with N rows and m columns, each element a;; standing for 
one of the s integers 0,1, 2, --- ,s — 1. Let us take the partial matrix obtained by choosing 
any ¢ < m columns of A. Each row now consists of an ordered f-plet of numbers, and each 
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element has one of s possible values, there are s‘ possible t-plets. The matrix A may be 
called an orthogonal array (N,m, s, t) of size N, m constraints, s levels and strength t, if by 
choosing any ¢ columns whatsoever every possible t-plet occurs the same number of times. 
Clearly N = \s‘ where d is an integer. Such arrays have been considered by Rao and are 
useful for various experimental designs. The existence of an orthogonal array (s? M, s, 2) is 
equivalent to the existence of a set of orthogonal Latin squares of side s and m constraints 
(i.e., the number of Latin squares in the set is m — 2). The fundamental question that can 
be asked regarding orthogonal arrays is the following: What is the maximum number of 
constraints for an orthogonal array, given N, s and i? Denote this number by f(N, s, t), 





then from known properties of Latin squares f(s?, s, 2) = s +1, if s is a prime or a prime 
power, and a theorem by Mann states that f(s?,s,2) = - +1, if s = py! --- p,* , where 
pi,**: , p, are different primes, andr is the minimum of p}>° , pj! --- pz* . The following 









generalisation of Mann’s theorem is proved in this note: 


F(NiN2 +++ Nx , 8182 +++ Se yt) = Min{f(Mi1 , sit), f(N2 , 82,t), «+> » S(Ne , Se , t)}. 





7. Transformations Related to the Angular and the Square Root. Murray F. 
FREEMAN AND JOHN W. Tukey, Princeton University. 














The use of transformations to stabilize the variance of binomial or Poisson data is 
familiar (Anscombe, Bartlett, Curtiss, Eisenhart) .'The comparison of transformed binomial 
or Poisson data with percentage points of the normal distribution to make approximate 
significance tests or to set approximate confidence intervals is less familiar. Mosteller and 
Tukey have recently made a graphical application of a transformation related to the 
square-root transformation for such purposes, where the use of ‘‘binomial probability 
paper’’ avoids all computation. We report here on an empirical study of a number of ap- 
proximations, some intended for significance and confidence work, and others for variance 
stabilization. (Prepared in connection with research sponsored by the Office of Naval 
Research). 


8. Standard Inverse Matrices for Fitting Polynomials. F. J. VERLINDEN, North 
Carolina State College. 





For fitting polynomials of the type, y = boxr® + bia + box? + --- + b,,2™, with the x’s 
equally spaced, published tables of orthogonal polynomials may be used. This procedure 
does not yield the 6’s directly, nor their variances or covariances, although such may be 
obtained by proper computations which are moderately tedious. In some types of statistical 
work, the 6’s and their variances and covariances may be desired. These may of course be 
obtained directly by the method of least squares but the computational work is prodigious 
relative to that for the orthogonal polynomial approach. When the z’s are equally spaced 
the elements of the variance-covariance matrix may be put in the simple form of sums of 
powers (including the zero power) of successive integers from zero to n (m equals one less 
than the number of observations). The elements of the inverses of matrices of this type 
have been worked out algebraically in terms of m for polynomials up to and including the 
quintic (m = 5). With these standard inverse matrices, the b’s and their variances and co- 
variances may quickly be obtained once the elements are evaluated numerically. These 
elements have been evaluated numerically up to n = 20. 























9. Mathematical Models in Biology. J. A. Rarrerty, Department of Biometrics, 
School of Aviation Medicine, Randolph Field, Texas. 





From the point of view of a bio-medical research administrator, mathematical models 
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will assume a greater role in biological research than heretofore. In anticipation of this 
trend, certain philosophical implications of models in biological theory and scientific theory 
in history are examined. A hierarchy of abstraction-levels in biology is delineated, and the 
role of mathematical models at these levels is illustrated by examples from the literature. 
Proposals are made for a concentration of mathematical effort on certain important bio- 
logical problems. Remarks are made on the capabilities and limitations of models in biology. 


10. Small Sample Performance of Biological Statistics. Irwin Bross, Johns 
Hopkins University. 


In this paper the dilution method for estimating bacterial density is investigated by an 
exact small sample method and also by an approximate one. Methodologies and design of 
experiments are compared for various small sample cases. 


11. Methodology in the Study of Physical Measurements of School Children. 
B. G. GREENBERG AND A. HuGues Bryan, University of North Carolina. 


In a series of investigations to determine by small-sampling technique what physical 
differences, if any, occur between children of differing socio-economic backgrounds, several 
problems of methodology arose. A pilot study was undertaken to assure maximum efficiency 
at each step. This paper reports some of these results.} It was found that the children could 
remain dressed (with the exception of boys’ bi-iliac measurement) without changing the 
magnitude of the differences. The pilot study enabled us to decide how many observers to 
use, and how much duplication of measurements by them was necessary. Minimum sample 
sizes were estimated to indicate physical differences of predetermined magnitudes. It was 
found that the age grouping 96-143 months was optimal from the standpoint of indicating 
physical differences between children of differing socio-economic levels. Boys and girls in 
the upper socio-economic levels were both taller and heavier for their age in this age group. 
There were no weight differences, however, when weight was adjusted for age and height. 
Measurement of the bi-iliac and transverse chest diameter provided little additional in- 
formation on physical differences. The calf circumference, an indicator of muscle mass and 
subcutaneous fat, is suggested as being a sensitive supplementary index to indicate physi- 
cal differences when age and height are adjusted. 


12. Tetrad Analysis in Yeast. A. S. HousEHOLDER, Oak Ridge National Labora- 
tory, Oak Ridge, Tennessee. 


In neurospora all four products of meiosis are recovered in the four spores of an ascus. 
In crosses AB X ab the asci are of three types, designated I, II or III according as all four, 
none, or two spores resemble parents. Frequencies of these types, P, P’ and P”’ are the 
observables. If there were no exchange P’’ would be zero; and one should have P’ = 0 
or } according to whether the loci were on the same or different chromosomes. 

Assuming only that no exchange occurs between sister chromatids and neglecting chro- 
matid interference, one can calculate without further assumptions a frequency P”’ of 
exchanges between a single locus and its centromere from data on three or more genes taken 
in pairs by equations 


Si7 = SoiSoj; , Pp" = 211 — s)/3, 
where the subscript 0 refers to a centromere. Lindegren makes such calculations from his 


own data, by taking groups of three, but makes no effort to reconcile discrepancies. Ney- 
man’s modified chi-square, however, permits combining all observations in 4 set of equa- 
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tions that yields easily to rapidly converging iterative solution. The equations are 


2s; = od > nj; 3)?(nz} -- nz) = J sj(nj+ nj; 4)2(2nz} _ nij"), 
isi ivi 

where 7;; is the number in class I and II combined for the loci 7 and 7, n;; the number 

class in III, and only those pairs (7, 7) are included which are found to be independent. 
The argument of A. R. G. Owen (Proc. Roy. Soc., Ser. B, Vol. 136 (1949) pp. 67-94.) 

can be paraphrased for the present case and a suitable generating function P(A, u) is being 

sought providing a metric. The specific one proposed by Owen is ruled out since s = 

P(—i,u) takes on a negative value for one locus, which is not possible with Owen’s function. 


13. Contribution to the Probabilistic Theory of Neural Nets. I. Randomization 
of Refractory Periods and of Stimulus Intervals. ANatot Rapoport, University 
of Chicago. 


Aggregates of neurons are considered in which the frequency of occurrence of neurons 
with a specified value of the refractory period follows certain probability distributions. 
Input-output functions are derived from such aggregates. In particular, if input and output 
intensities are defined in terms of stimulus frequencies and firing frequencies per neuron 
respectively, it is shown that a rectangular distribution of refractory periods leads to a 
logarithmic input-output curve. If input and output are defined in terms of the total 
number of stimuli and firings in the aggregate, it is shown how the ‘“‘mobilization’’ picture 
leads to the logarithmic input-output curve. 

By randomizing the intervals between stimuli received by a single neuron and by intro- 
ducing an inhibitory neuron a very simple ‘‘filter net’? can be constructed whose output 
will be sensitive to a particular range of the input, and this range can be made arbitrarily 
small. 


14. Theoretical and Experimental Aspects in the Removal of Air-Borne Matter 
by the Human Respiratory Tract. H. D. LanpAnt, University of Chicago. 


The principal factors governing the fate of a particle in the respiratory tract are impac- 
tion due to inertia, settling due to gravity and Brownian movements. For a given respira- 
tory pattern, it is possible to calculate the probable fate of a particle from a knowledge of 
the geometry of the passages. These calculations have been carried out in such a manner as 
to obtain the theoretical amounts of material deposited in various regions of the lungs as 
well as the relative amounts in various fractions of the expired air. Similarly, it is possible 
to estimate the probable fate of a particle which passes through the nasal passages. Ex- 
periments have been carried out to verify a number of these predictions. On the whole, the 
agreement, as illustrated in the slides, is fairly satisfactory when one considers the com- 
plexity of the calculations. 


15. An Application of Biometrics to Zoological Classification. F. M. WapLeEy, 
Navy Department, Washington, D. C. 
Statistical problems in taxonomy are discussed; attention must be paid to variation of 


individuals as well as of group means. Covariance analysis and the discriminant function 
technique are applied to multiple measurements in groups of molluscan fossils. 


16. The Analysis of Hemotological Effects of Chronic Low-Level Radiation. 
Jack Mosuman, United States Atomic Energy Commission, Oak Ridge, Ten- 
nhessee. 
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Several methods are investigated for analyzing the possible effects of chronic low-level 
irradiation upon the employees of the operating contractors of the US AEC. The effects 
investigated are those on the red blood count, hemoglobin, white blood count, lymphocytes 
and neutrophils. The analysis includes measurements of significant differences among 
individuals, geographic sites and the exploration of various indices of exposure to radiation. 
A non-parametric determination of trend values for individuals which may be applied 
to mass data is considered. 


17. Statistical Problems in Psychological Testing. Epwarp E. Cureton, Uni- 
versity of Tennessee. 


Though great progress has been made in mathematical statistics in recent years, a number 
of the major statistical problems encountered in the development and use of psychological 
tests remain unsolved. Some of these problems are outlined, with particular reference to 
the mathematical models and assumptions implied by psychological theory, by the nature 
of the experimental data, and by the conditions under which the results and findings are 
to be applied. 


18. Accuracy of a Linear Prediction Equation in a New Sample. GrorceE E. 
NicHOLson, JR., University of North Carolina. 


The problem considered is as follows. Given two samples S, and S, of Ni and N2 observa- 
tions on a p + 1 character random variable (y, 7 --- 2p). Let Y: and Y2 be the linear regres- 
sion equation computed by the method of least squares from each sample. The effect of 

: ; : 4 ‘ ‘ S(y: — ¥1)? 
using Y; to predict the y’s in S2 is considered. The ratio k - eS s used as a measure 

me 24 
of the predicting efficiency of Y; in S: relative to Y2 when the X; are fixed for the usual 
regression model. The general multivariate case is also considered. 


19. Independence of Quadratic Forms in Normally Correlated Variables. YukI- 
yost Kawapa, Tokyo University of Literature and Science, Tokyo, Japan. 


An extension is given of theorems of Craig, Hotelling and Matérn which includes the 
following theorem, proved by a new method: If two quadratic forms Q; , Q2 in normally and 
independently distributed variates with zero means and unit variances satisfy the four 
conditions E(QiQ}) = E(Q})E(Qi), for i, 7 = 1, 2, then the product of the matrices of the 
two forms in either order is zero. 

se 


20. Bounds on the Distribution of Chi-square. S. A. Vora, University of North 
Carolina. 


Let 


k k 
= 2 wi —npi)*/npji, x? = Z (vj +3 — Npi)*/Npi, 
= je 
k k 
where v; > 0,20; = n, p; > 0,2 p; = landN =n+k/2. Bounds on the multinomial 
1 1 
probability 7 in terms of x” are obtained. A triangular transformation of 


i= (vi + 3 — Npi)/{Npi(l — pid}? @=1 
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to y; is applied so that 


where d is determined later by equating the coefficients of x’. Certain rectangles r(v) 
with (y , +++ , ys-1) as a mid-point are non-overlapping and cover the entire space Ry 
for v; = 0, +1, +2,--- . If x’ < c, then bounds on 7 in terms of the integral of the (k —1) 
dimensional normal frequency function over the rectangle r (v) are obtained. Prob. {x’? < c} 
is the sum of 7 over x”? < c, so the integral over the sum of rectangles whose mid-points 
lie within the hypersphere x”? < c is considered. Two hyperspheres, one which contains the 
sum of those rectangles, and one which is contained in it are used for the bounds, giving 


Ao Fxa(e2) < Prob. {x’? < c} < M-Feua(ea), 


where F;_:1(x) is a chi-square distribution function with (k — 1) degrees of freedom and 
Ai, Ae, C1 , Co are functions of c, n, k and p.,--- , p,. As n ~ ~, both bounds tend to 
F,-1(c). Bounds of the same form are obtained for Prob.{x? < C}. Closer bounds 
for Prob.{x? < C} are given in terms of a non-central chi-square distribution. 


21. Estimation of Genetic Parameters. C. R. HENDERSON, Cornell University. 


Many applications of genetics and statistics to the improvement of plants and animals 
deal with experimental data for which the underlying model is assumed to be 


P q 
.. ys 

i 2 bi Lia + <1 Ui Zia + €a; 
i= ‘= 


where 6; are unknown fixed parameters, z;2 and z;. are observable parameters, the u; are 
a random sample from a multivariate normal distribution with means zero and covariance 
matrix || 0; ||, and the e. are normally and independently distributed with means zero 
and variances o2 . If o;; = 0 wheni ¥ j and if o2 = o? , the model is the one usually as- 
sumed when components of variance are estimated. 

Three different estimation problems are involved, (1) estimation of 6; under the assump- 
tions of the model, (2) estimation of u; and (3) estimation of o;; . The first two problems 
are not solved satisfactorily by the least squares procedure in which the wu; are regarded 
as fixed, but the maximum likelihood solution does lead to a satisfactory estima- 
tion procedure. 

Assuming that the o;; and o% are known, the joint maximum likelihood estimates of 
b; and u; are the solution to the set of linear equations 

Q 


y f_2 ys sy — oe / 2 
=, 5 (2 thaXia/o2) + a, ui (= tha2ia/o2) = = Tra Ya/o2, 
e@ a g@ a a 


q 
. Ss 2 Ss ; > 2 2 
= b; (2 LiaZha/o2) + = ui(o* + > 2ia2ha/o2) = 2 2haYa/o7, 
i= a ‘= a a 


Some important applications of this estimation procedure to genetic studies are described 
and certain computational short-cuts are suggested. 

The problem of estimating o;; has not been solved satisfactory although under certain 
quite general assumptions the equations for the joint estimation of b; , ui, oi; , and o% 
can easily be written. The solution to the equations, however, is too difficult to make the 
procedure practical. Nevertheless unbiased estimates of o;; can be obtained by equating 
to their expected values the differences between certain reductions in sums of squares 
computed by least squares and solving for the o;; . In general, the expectation of the reduc- 


tion due to b} , +++ , bp, ur, +++, uel <q) is DJ J d*EL(Y,Y;:), where d* are the elements 


gh 
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e e a ° ° — on ys 4 > ' 
of the matrix which is the inverse of the (p + k)? matrix of coefficients and the Y, are the 
right members of the least squares equations. 


22. Estimating the Mean and Standard Deviation of Normal Populations from 
Double Truncated Samples. A. C. CoHEN, Jr., University of Georgia. 


The method of maximum likelihood is employed to obtain estimates of the mean and 
standard deviation of a normally distributed population from double truncated random 
samples. Two cases are considered. In the first, the number of missing variates is assumed 
to be unknown. In the second, the number of missing (unmeasured) variates in each tail is 
known. Variances for the estimates involved in each case are obtained from the maximum 
likelihood information matrices. A numerical example is given to illustrate the practical 
application of the estimating equations obtained for each of the two cases considered. 


23. Minimax Estimates of Location and Scale Parameters. GoPpinatH KALLI- 
ANPUR, University of North Carolina. 


If the joint fr. f. of the random variables X; , --- , Xw contains only a scale parameter 
and is of the form 


then under mild restrictions the following theorem is proved: 


a-—-@ 


‘THEOREM I: /f the loss function is of the form W ( 


), the best or minimax estimate 
Qa 


&o(x) of a minimizes 


w.r.d. & and further, 
Gout, ,-°* 5 ean) = pao(X1,°°° , tn), a> ©. 


When both location and scale parameters are present and the joint fr. f. is of the form 


iIn= 6 
m ? 


under conditions similar to those in Theorem 1) we obtain two results for the estimation 
of 6 and a, respectively, one of which is: 


PHEOREM 2: Jf the loss function is of the form W (% — }, the best estimate 0o(x) of 6 minimizes 
Qa 


er? fe-e\ 1 (n-6 tn — 6 
| 7 7-2 = (<< —————— ae ee 
a * a an a 


_ ry) = A 

Lu 
These theorems have been applied to derive minimax estimates in the case of standard 
distributions. Finally, the problem of estimating the difference between the location 


parameters of two populations is briefly considered. The results obtained in this paper are 
a continuation of the line of approach suggested in Theorem 5 of Wald’s, ‘‘Contributions 
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to the Theory of Statistical Estimation and Testing Hypotheses.”’ (Annals of Math. Stat., 


Vol. 10 (1939), pp. 299-225). (The present work was carried out under Office of Naval Re- 
search contract.) 






24. On Some Features of the Neyman-Pearson and the Wald Theories of Statis- 
tical Inference, Their Interrelations and Their Bearing on Some Usual Problems 
of Statistical Inference. S. N. Roy, University of North Carolina. 




























With two alternative hypotheses H; and H; it is shown that (i) the most powerful test 
of H, with respect to H2 is automatically an unbiased test in the sense that its power is 
never less than (and usually greater than) the level of significance a and (ii) there is also 
a least powerful test with its power not greater (usually less) than a. This means that all 
tests have powers lying in between, which gives a complete picture of the possible family 
of tests and provides a basis for defining efficiency of tests. 

With the first kind of error a is tied up a minimum second kind of error 8 (comple- 
mentary to the maximum power P), and the level at which @ is fixed depends upon some 
compromise between a and 8. This intuitive approach is formalised by the introduction of 
loss functions related to and apriori probability weights for H; and H2 , thus leading to 
the first stage in the Wald treatment of dichotomy with two solutions in the observation 
space corresponding respectively to minimum and maximum total risks. This is imme- 
diately generalised to the first stage in the Wald treatment of multichotomy with minimum 
and maximum total risk solutions. An important special case is discussed in which all the 
possible alternatives to a particular hypothesis are, by our test procedure, indistinguish- 
able among themselves, thus effectively forming only one alternative to the hypothesis, 
which means a degenerate multichotomy. The bearing of this on most powerful tests on 
an average under the Neyman-Pearson theory is also discussed. 

The problem of testing a composite hypothesis which is usually treated in terms of the 
Neyman-Pearson theory is posed and treated in terms of the (first stage) Wald theory and 
an indication is given of how these notions could be applied to the usual problems of uni- 
variate and multivariate analysis. 


25. Note on Uniformly Best Unbiased Estimates. R. C. Davis, Naval Ordnance 
Test Station, Inyokern, California. 















For the estimation in an absolutely continuous probability distribution of an unknown 
parameter which does not possess a sufficient statistic, it is shown that no unbiased esti- 
mate for the unknown parameter exists which attains minimum variance uniformly over 
a parameter set of arbitrary nature. This result demonstrates the impossibility of obtain- 
ing a generalized sufficient statistic first proposed by Bhattacharyya. Although not used 
in this note it is surmised that Barankin’s powerful results on locally best unbiased esti- 
mates can be applied to yield further results in this direction. 


26. Competitive Estimation. Hersert Ropsins, University of North Carolina. 





Let 6 be a vector random variable with distribution function G(@) and let x be a vector 
random variable whose frequency function f(z; 6) depends on 6. Two statisticians, A and B, 
are required to estimate 6 from the value of z. If A’s estimate is closer to 6 he wins one 
dollar from B, and vice versa; in case of a tie no money changes hands. It is shown that A . 
should estimate @ by the function a(z) = median of posterior distribution of @ given z; 
his expected gain will then be >0 whatever estimate B may use. If G(6) is not known to A 
he should estimate it from the series of values of @ which have been observed in previous 
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trials. If these are not known, A should estimate G(6@) from the values of z which have 
previously occurred; how this may be done is discussed elsewhere (see Abstract 35). 

From the point of view of the theory of games, when G(@) is unknown we have a game in 
which the “‘rules’”’ are unknown and must be successively estimated from past experience. 
Other examples arise whenever a game involves random devices whose probability dis- 
tributions are not known to the players but must be inferred by statistical methods, in 
general from secondary variables which contain only part of the total information. The 
role of statistical inference in such ‘‘long term’? games is fundamental. 


27. The Effect of an Unknown ‘Location Disturbance’ on “Student’s” t based 
on a Linear Regression Model. Urram Cuanp, Boston University. 


Consider y: ,--- , YN; , YNi41,°°* Yn, a set of observations ordered in time. If the 
y’s are normally and independently distributed according to V(a + B(t — 2), o*%) and we 
want to find out if the y’s have changed with time, we usually employ a ‘‘Student’s”’ t type 
of statistic with V — 2 degrees of freedom. If, as a consequence of the impact of a certain 
unknown political or economic change in the past on the y’s, the y’s actually constitute 
two independent, normal samples y1,--- yw1, YNis1,°°* Yn distributed according to 
N(m, , «*), N(mz , o?) respectively, a two-sample ‘‘Student’s”’ ¢t also based on N — 2 degrees 
of freedom would be the appropriate statistic to use for the hypothesis m, = mz. If, in 
fact, the latter situation describes the correct state of affairs, and the statistician employs 
the ‘‘Student’s” ¢ based on the regression model, he commits an error. The present paper 
investigates the nature of such an error in the light of the point of impact as determined 
by the magnitude of V, and the intensity of the impact as determined by the standardized 


, mMo2— my, m : 
‘distance’ ———-——— of this extraneous ‘shock’ on the ordered set of observa- 
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tions y. 


28. Corrections for Non-normality for the Two-sample t and the F Distributions 
Valid for High Significance Levels. Ratpuo A. BrapLey, McGill University. 


The effects of non-normality of the parent population on common tests of significance 
have long been of concern in the application of statistical methods to experimental data. 
In this paper, the two-sample t-statistic is expressed as a simple multiple of the cotangent 
of an angle between two lines in a space of dimensionality one less than the total of the 
sample sizes; the F-statistic for / samples is expressed as a multiple of the cotangent of 
an angle between a line and a plane of (k — 1) dimensions in a space, again, of dimension- 
ality one less than the total of the sample sizes. The geometrical formulation is such as to 
suggest approximations to the distributions of these statistics valid for large values of 
the statistics, and these approximations are obtained. The approximations are shown to be 
exact in the special cases where the parent population is normal, and a method of evalua- 
tion of correction factors is given for a wide class of parent populations. The approximation 
procedures are valid for the distributions under both null and non-null hypotheses. 


29. Some Tests Based on the Empirical Distribution Function. (Preliminary 
Report). James F. HANNAN, University of North Carolina. 


Let 2 


: (X; , X2,--- , X,) be an independent sample of n where X; has the continu- 
ous ¢.d.f. 


? 
(x). Let S,(2) be the empirical distribution function. Acceptance regions of 


by ll 


SS 
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the type {X:S,(x2) < ¢(z) for all x} are considered for different specifications of ¢ and their 
probabilities evaluated. The method of evaluation consists in identifying the regions with 
regions defined in terms of the order statistics of a sample of n from the uniform distribu- 
tion on the interval (0, 1). The result obtained for ¢(z) = F(x) + c/n, 0 < c, integral <n 
is used to provide a direct proof of the Kolmogoroff result 

lim P[n}/2 sup (Sn(z) — F(z)) < 2] = 1 —e-*’, 


rn» 


while that obtained for ¢(z) = F(z) + 1,0 < t < 1, gives the exact c.d-f. of the statistic 
sup: (S,(z) — F(z)). 


30. On a Generalization of the Behrens-Fisher Problem. (By Title). Jonn E. 
WatsH, Rand Corporation, Santa Monica, California. 


Let m -+ n independent observations be available where it is only known that a specified 
m of them are from continuous symmetrical populations with common median yu while the 
remaining n are from continuous symmetrical populations with common median v. This is 
the generalization of the Behrens-Fisher problem investigated; some tests and confidence 
intervals for » — v which are valid for the generalized situation are presented. For definite- 
ness, suppose that n < m. The procedure used is to subdivide the m observations (common 
median uw) into n groups of nearly equal size and form the mean of the observations for 
each group. Pair the n means with remaining n observations and subtract the value of 
each observation from the value of the mean with which it is paired. The resulting n values 
represent independent observations from populations with common median » — »v. Tests 
and confidence intervals for » — » are obtained by applying the results of ‘‘Applications 
of Some Significance Tests for the Median Which are Valid Under Very General Condi- 
tions” (Jour. Amer. Stat. Assn., Vol. 44 (1949), pp. 342-55) to these n values. To measure 
the ‘‘information’”’ lost by using the generalized tests when one actually has two inde- 
pendent samples from normal populations, power efficiencies are computed with respect 
to: (a) Scheffé’s ‘“‘best”’ t-test solution and (b) most powerful solution when ratio of vari- 


ances is known. Case (a) yields an upper bound while case (b) furnishes a lower bound 
for the actual efficiency. 


31. Construction of Partially Balanced Designs with two Accuracies. (By Title). 


S. S. SHRIKHANDE, University of North Carolina and Nagpur College, Nagpur, 
India. 


Various methods of construction of partially balanced designs first introduced by Bose 
and Nair (Sankhyd, Vol. 4 (1939), pp. 337-373) have been considered. Two of the methods 
given are generalisations of a difference theorem given by them. Another method is the 
inversion of an unreduced balanced incomplete block design with k = 2. Use has also been 
made of the existing balanced incomplete block design in another direction. A number of 
designs can also be obtained by methods of finite geometries and especially by omitting a 
number of treatments and certain blocks from the complete lattice designs. Use of curves 
and surfaces in finite geometries and the use of multifactorial designs given by Plackett 
and Burman (Biometrika, Vol. 33 (1946), pp. 305-325) are also indicated. 


32. Designs for Two-way Elimination of Heterogeneity. (By Title). 8. S. 
SHRIKHANDE, University of North Carolina and Nagpur College, Nagpur, 
India. 


Use has been made of the existing balanced and some partially balanced designs for two- 
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way elimination of heterogeneity with at most two accuracies. Particular cases of these 
designs were given by Youden (Contributions from Boyce Thompson Institute, Vol. 9 (1937), 
pp. 317-326) and Bose and Kishen (Science and Culture (1939), pp. 136-137). The method 
depends upon interchanging the positions of various treatments in the different columns 
(blocks), if necessary, so as to satisfy certain conditions. 


33. Designs for Animal Feeding Experiments. (By Title). S. S. SarrkHanpk, 
University of North Carolina and Nagpur College, Nagpur, India. 


In anima!-feeding experiments change-over designs are gencrally preferable to continu- 
ous feeding experiments. In change-over designs both the direct and carry-over treatment 
effects are important. Use of balanced and partially balanced incomplete block designs 
toward this end has been considered. 


34. A Truncated Sequential Procedure for Interval Estimation, with Applications 
to the Poisson and Negative Binomial Distributions. (Preliminary Report). 
(By Title). D. Martin SAnpDE.LIus, University of Uppsala, Sweden, and Uni- 
versity of Washington. 


Let x, yi: , y2 , --: be a sequence of random variables defined in (0, ~), and let n be the 
n+1 


smallest integer satisfying 2 y; > tz, where ¢ > 0 is a non-random quantity. Define u;, 
k — k 

either as 2 y;/x or as the smallest integer exceeding 2 y:;/z, k = 1, 2,--- . Given the 
i=1 ons 


distribution function F(x, 6) of x and, for any ¢, the conditional distribution of n with 
respect to x, the distribution of wu, is obtained. The problem is to determine a confidence 
interval for @ with confidence coefficient 1 — a on the basis of either an observation on 
uz , if ux < t, or an observation on n, if n < k — 1. The following procedure is proposed: 
If uj, < t, choose 0:9 and 6;; according to a rule satisfying Prob (A: < @ < O01) | ue < t) > 
1 —a.Ifn < k — 1, choose 629 and 62; such that Prob (02. << @< @n|\n<k—1) >1l-—a. 
For continuous u, the following cases are discussed: a) x = 6 with probability 1, and n 
has, for any ¢, a Poisson distribution with mean t@, b) x has a Gamma distribution with 
mean 9, and the conditional distribution of n with respect to z is, for any ¢, a Poisson dis- 
tribution. Both cases may, for instance, be applied to bacterial counting. 


35. A Generalization of the Method of Maximum Likelihood: Estimating a 
Mixing Distribution. (Preliminary Report). (By Title). Herserr Rossins, Uni- 
versity of North Carolina. 


Let 6 be a vector random variable with distribution function G(6@) belonging to some 
class G, let x be a vector random variable whose frequency function f(x; @) depends on 9, 


and let g*(x) = f f(x; 0) dG(6) be the resulting frequency function of zx. From a sample 
2, ,%2,°+++ it is required to estimate G(@). The generalized method of maximum likeli- 
hood consists in using the estimates G,(0; 21, --- , 2n) in @ for which I g*(z;) is a maxi- 


mum. Under certain restrictions this method is consistent as n > ~. 

Any consistent method of estimating the mixing distribution G(@) from the sequence 
Z1,2%2,°-: yields a solution of parametric statistical decision problems in the following 
manner: from past values 7 , --* , 2n-1 we estimate G(@), and then use the corresponding 
Bayes solution of the decision problem to reach our decision for z, , even though the value 
6, which produced 2, is different from those which produced 21 , --- , 2n-1 . In certain cases 
of long-term experimentation this approach seems more reasonable than the minimax 
method which decides on the course of action appropriate to @, on the basis of z, only, 
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and ignores the information about the prior distribution of @ which is contained 
in 71,°°° ee 


36. Smallest Average Confidence Sets for the Simultaneous Estimation of k 
Normal Means. (By Title). Racuu Ras Banapwr, University of North Carolina. 


Let v = (@1,°+* , Zin, 3 *** 5 Lea,°** , Leng) denote the combined sample point in 
samples of sizes m , m2,--- , me from normal populations 7 , m2. --- , 7%, respectively, 
7; having mean y; and variance o; . Writing » = (m ,u2,--- , me), denote the k dimensional 
Euclidean space of all points » by R. Given any parameter point (u, o), where 
o = (0; ,62,°*: , of), and any set-valued function f(v) defined for all sample points v and 
having subsets of RF as its values (which satisfies certain measurability hypotheses), let 
a(f | u,o) = probability of the statement “‘u « f(v)’’ being false, and B(f | u, ¢) = expected 
Lebesgue measure of f(v). We consider the problem of constructing f(v) so as to make both 
a and B “as small as possible.’? One of the results obtained is as follows: Given 
p,0 < p <1, let Rigcpy) = {wi 2h nil — wi)/li < g(p)- Zi ndss/ls}*}, where 
# = n,'Zy* ai; , 82 = nz Tf (zi, — #:)?,X = (1, l,--- , i), the l,’s being given positive 
constants, and ¢(p) being determined by P(x? > ¢(p)-xiv-x) = p, where xi , xw-x are inde- 
pendent chi-square variables with k, N — k degrees of freedom (k < N = =i n;). Then 
(a) obviously a(f¥.2( p) |, CA) = p forally and allc,0 <c < ~, and (6) if f(v) is any other 
function such that a(f | wu, cd) < p for all w and all c, either (z) f(v) and Rte (v) differ by 
a set of measure zero for almost every v, or (ii) sup {8(f | u, cd)} > sup {B(fa:r(p) | wu, Cr} 

wEeR pe 


for every c. 


0 en RR ei 


NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of general interest 
Personal Items 


Mr. Harry H. Goode, formerly head of the Special Projects Branch, Special 
Device Center, Office of Naval Research 1, New York, is now Superviser of the 
Aero-Physics Group, Aeronautical Research Center, University of Michigan, 
Ann Arbor, Michigan. 

Mr. William G. Howard, who was previously employed by the Johns Hopkins 
University, Institute for Cooperative Research, is presently employed as Mathe- 
matical Statistician in the Air Studies Division of the Library of Congress. 

Miss Margaret. Kampschaefer has accepted a position as Statistician in the 
U.S. Bureau of Labor Statistics, Minnesota Payroll Project, Minnesota Division 
of Employment and Security. She was formerly employed as Junior Mathe- 
matician at the Argonne National Laboratory, Naval Reactor Division, Chi- 
cago, Illinois. 

Dr. Albert Noack has recently been appointed Professor of Actuarial Mathe- 
matics at the University of Koeln, Germany. 


Second Berkeley Symposium on Mathematical Statistics and Probability 


The Second Berkeley Symposium will be held at the Statistical Laboratory, 
University of California, Berkeley, from July 31 to August 12, 1950, with the 
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cooperation of the American Statistical Association (Biometrics Section), the 
Biometric Society (Western North American Region), the Econometric Society, 
the Institute of Mathematical Statistics, the Institute of Transportation and 
Traffic Engineering (UC), and the Office of Naval Research. 

The Symposium will include sessions on mathematical statistics, probability, 
biometrics, econometrics, traffic engineering, astronomy, and physics. The com- 
plete program may be obtained from the Statistical Laboratory. The papers will 
be published by the University of California Press as the Proceedings of The 
Second Symposium. 


Cumulative Index of Volumes 1-20 

Attention is called to the fact that there is now available a cumulative index 
for Volumes 1 through 20 (1930-1949) of the Annals of Mathematical Statistics. 
Copies may be secured from the office of the Secretary-Treasurer for $1.00 per 
copy. 

el 
New Members 
The following persons have been elected to membership in the Institute 


(December 1, 1949 to February 28, 1950) 


Bain, John C., B.A. (Univ. of Toronto), President’s Statistician, Abitibi Power 
& Paper Company, Ltd., 408 University Avenue, Toronto 2, Ontario, 
Canada. 

Blakemore, George J., Jr., A.B. (George Wash. Univ.), Student at George Wash- 
ington University, 1748 Hobart St., N.W., Washington 10, D.C. 

Bross, Irwin, D. J., Ph.D. (North Carolina State College), Research Associate, 
Department of Biostatistics, School of Public Health, The Johns Hopkins 
University, 615 North Wolfe Street, Baltimore 5, Maryland. 

Cansado Maceda, Enrique, Ph.D. (University of Madrid), Assistant Professor 
of Mathematical Statistics, Faculty of Sciences, University of Madrid and 
Official of the National Institute of Statistics, Paseo de Rosales, 50 Madrid, 
Spain. 

Clatworthy, Willard H., M.A. (Univ. of Kentucky), Student at the University 
of North Carolina, Box 168, Chapel Hill, North Carolina. 

Dinsmore, Robert J., A.B. (Univ. of Calif.), Student at the University of Cali- 
fornia, Berkeley, California, 2428 Milvia St., Berkeley 4, California. 

Enell, John W., Eng. Sc.D. (New York Univ.), Assistant Professor of Adminis- 
trative Engineering, New York University, 71 Ayers Court, West Englewood, 
New Jersey. 

Flores, Anna M., M.Sc. (Univ. of Mexico), Mathematician, Torres Adalid 
*% 511, Mexico City. 

Garner, Norman R., B.A. (Univ. of Rochester), Graduate Student at Univer- 
sity of North Carolina, 15 Goldston Ave., Carrboro, North Carolina. 

Hannan, James F., M.A. (Harvard), Research Assistant, Department of Mathe- 
matical Statistics, University of North Carolina, P.O. Box 168, Chapel 
Hill, North Carolina. 


REPORT OF CHAPEL HILL MEETING 317 


Klein, Joseph, B.S. (Rutgers), Graduate Student at Rutgers University, P.O. 
Box 501, Red Bank, New Jersey. 

Lewis, Evan J., Ph.D. (Cornell Univ.), Physicist, Corning Glass Works, Corning, 
New York. 

Palekar, Madhukar N., B.S. (Bombay), Graduate Student in Department of 
Mathematical Statistics and Departmental Assistant, 108 Furnald Hall, 
Columbia University, New York 27, New York. 

Page, Woodrow W., M.A. (Oklahoma Univ.), Graduate Student, University of 
North Carolina, 241 Jackson Circle, Chapel Hill, North Carolina. 

Pretorius, S. J., Ph.D. (Univ. of London), Professor of Statistics, University of 
Stellenbosch, Soeteweide, Stellenbosch, Union of South Africa. 

Price, Don C., M.A. (Kent State Univ.), Student, Department of Mathematical 
Statistics, University of North Carolina, 1621 Shorb Ave., N.W., Canton 8, 
Ohio. 

Scalora, Frank S., A.B. (Harvard), Assistant in Mathematics, 106 Mathematics 
Building, University of Illinois, Urbana, Illinois. 

Somerville, Paul N., B.Sc. (Alberta, Canada), Graduate Student in Department 
of Mathematical Statistics, University of North Carolina, 316-B Dormitory, 
Chapel Hill, North Carolina. 

Sirken, Monroe G., M.A. (Univ. of Calif. at L. A.), Research Associate, Labora- 
tory of Statistical Research, Department of Mathematics, University of 
Washington, Seattle, Washington. 

Stearn, Joseph L., M.S. (College of N. Y.), Mathematician, U. S. Coast & 
Geodetic Survey, Department of Commerce, Washington, D. C. 

Whelan, Walter J., \1.A. (Boston Univ.), Student, Department of Mathematical 
Statistics, Columbia University, New York, 119 Wilmington Ave., Dor- 
chester 24, Massachusetts. 

Wile, Janet L., A.B. (Univ. of Rochester), Statistician, Department of Defense, 
Army and Transportation Corps, #156, 1813 Queens Lane, Arlington, 
Virginia. 

Wilhelmsen, Lars, Aktuarkandidat (Oslo Univ.), Actuary, Storebrand, Boks 
425, Oslo, Norway. 

ee 


REPORT OF THE CHAPEL HILL MEETING OF THE INSTITUTE 


The forty-second meeting of the Institute of Mathematical Statistics was 
held jointly with the Biometric Society (Eastern North American Region) at 
the Chapel Hill campus of the University of North Carolina on Friday, March 
17, and Saturday, March 18, 1950. One hundred twenty-one persons registered, 
including the following members of the Institute: 


R. L. Anderson, T. W. Anderson, Geoffrey Beall, C. A. Bennett, Mrs. C. A. Bennett, Nils 
Blomqvist, R. C. Bose, R. A. Bradley, Irwin Bross, Glen Burrows, L. D. Calvin, Uttam 
Chand, W. G. Cochran, A. C. Cohen, Jr., W. S. Connor, Jr., P. P. Crump, E. E. Cureton, 
R. C. Davis, W. L. Deemer, T. G. Donnelly, Churchill Eisenhart, J. W. Fertig, S. G. 
Ghurye, Leon Gilford, B. G. Greenberg, F. E. Grubbs, Max Halperin, J. F. Hannan, Boyd 
Harshbarger, C. R. Henderson, Wassily Hoeffding, Harold Hotelling, A. 8S. Householder, 
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W. G. Howard, S. L. Isaacson, A. W. Kimball, Jr., B. F. Kimball, Marguerite Lehr, Guido 
Liserre, Eugene Lukacs, C. L. Marks, H. A. Meyer, Paul Minton, D. J. Morrow, Jack 
Moshman, C. M. Motley, M. L. Norden, H. W. Norton, Ingram Olkin, Paul Peach, J. A. 
Rafferty, Wyman Richardson, Jr., Herbert Robbins, S. N. Roy, S. A. Schmitt, R. E. Ser- 
fling, D. H. Shepard, P. N. Sommerville, E. W. Stacy, J. W. Tukey, D. F. Votaw, Jr., 
F. M. Wadley, M. A. Woodbury, Marvin Zelen. 


Professor R. L. Anderson presided at the opening session for contributed 
papers on Friday morning. The following papers were presented: 


1. A Method of Estimating the Parameters of an Autoregressive Time Series. Mr. S. G 
Ghurye, University of North Carolina. 

2. Most Powerful Rank Order Tests. Professor Wassily Hoeffding, University of North 
Carolina. 

3. The Comparison of Percentages in Matched Samples. Professor W. G. Cochran, Johns 
Hopkins University. 

4. A Method of Estimating Components of Variance in Disproportionate Numbers. Pro- 
fessor H. L. Lucas, North Carolina State College. 

5. On the Theory of Unbiased Tests of Simple Statistical Hypotheses Specifying the Values 
of Two Parameters. Mr. S. L. Isaacson, Columbia University. 

6. A Note on Orthogonal Arrays. Professor R. C. Bose, University of North Carolina. 

. Transformations Related to the Angular and the Square Root. Mr. M. F. Freeman 

and Professor J. W. Tukey, Princeton University. 

8. Standard Inverse Matrices for Fitting Polynomials. Mr. F. J. Verlinden, North Caro- 

lina State College. 


On Friday afternoon Dr, James A. Rafferty, School of Aviation Medicine, 
Randolph Field, Texas, gave an invited address on Mathematical Models in 
Biology. Professor Gertrude M. Cox then presided at a session for contributed 
papers, at which the following papers were presented: 


1. Small Sample Performance of Biological Statistics. Mr. Irwin Bross, Johns Hopkins 
University. 

2. Methodology in the Study of Physical Measurements of School Children. Professor 
B. G. Greenberg and Professor A. H. Bryan, University of North Carolina. 

3. Tetrad Analysis in Yeast. Dr. A. S. Householder, Oak Ridge National Laboratory. 

4. Contribution to the Probabilistic Theory of Neural Nets. I. Randomization of Refrac- 
tory Periods and of Stimulus Intervals. Professor Anatol Rapoport, University of 
Chicago. 

5. Theoretical and Experimental Aspects in the Removal of Airborne Matter by the Human 
Respiratory Tract. Professor H. D. Landahl, University of Chicago. (Read by Pro- 
fessor Rapoport.) 

6. An Application of Biometrics to Zoological Classifications. Dr. F. M. Wadley, Navy 

Department, Washington, D. C. 

. The Analysis of Hemotological Effects of Chronic Low-level Radiation. Mr. Jack Mosh- 

man, United States Atomic Energy Commission, Oak Ridge, Tennessee. 


=I 


A joint dinner of the two sponsoring organizations was held at the Carolina 
Inn on Friday evening, with an attendance of sixty-two. Professor W. G. Cochran 
as toastmaster introduced Chancellor R. B. House of the University of North 
Carolina who welcomed the gathering with words and music. Professor Gertrude 
M. Cox responded for the Biometric Society and Professor D. F. Votaw for the 
Institute. 

Professor Harold Hotelling presided at a Saturday morning symposium on 


a 
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multivariate analysis. Professor E. E. Cureton of the University of Tennessee 
gave the opening address on Statistical Problems in Psychological Testing. After 
a lively discussion the following contributed papers were presented: 


1. Accuracy of a Linear Prediction Equation in a New Sample. Professor George E. 
Nicholson, Jr., University of North Carolina. 

2. Independence of Quadratic Forms in Normally Correlated Variables. Professor Yuki- 
yosi Kawada, Tokyo University of Literature and Science, Tokyo, Japan. (Read 
by the chairman). 

3. Bounds on the Distribution of Chi-square. Mr. 8. A. Vora, University of North 
Carolina. 


This was followed by a Biometric Society address by Professor C. R. Henderson 
of Cornell University on Estimation of Genetic Parameters. 

Professor W. G. Cochran presided at the final session for contributed papers 
on Saturday afternoon. The following papers were presented: 


1. Estimating the Mean and Standard Deviation of Normal Populations from Double 
Truncated Samples. Professor A. C. Cohen, Jr., University of Georgia. 
2. Minimax Estimates of Location and Scale Parameters. Mr. Gopinath Kallianpur, 
University of North Carolina. 
3. On Some Features of the Neyman-Pearson and Wald Theories of Statistical Inference, 
Their Interrelations and Bearing on Some Usual Problems of Statistical Inference. 
Professor S. N. Roy, University of North Carolina. 
4. Note on Uniformly Best Unbiased Estimates. Mr. R. C. Davis, Naval Ordnance Test 
Station, Inyokern, Calif. 

. Competitive Estimation. Professor Herbert Robbins, University of North Carolina. 

6. The Effect of an Unknown ‘Location Disturbance’ on ‘‘Student’s’”’ t Based on a Linear 
Regression Model. Professor Uttam Chand, Boston University. 

. Corrections for Non-normality for the Two-sample t and F distributions Valid for 
High Significance Levels. Professor Ralph A. Bradley, McGill University. 

8. Some Tests Based on the Empirical Distribution Function. Mr. J. F. Hannan, Uni- 
versity of North Carolina. 

9. On a Generalization of the Behrens-Fisher Problem. (By title). Dr. John E. Walsh, 
Rand Corporation, Santa Monica, Calif. 

10. Construction of Partially Balanced Designs with Two Accuracies. (By title). Mr. 
S. 8S. Shrikhande, University of North Carolina and Nagpur College, Nagpur, India. 

11. Designs for Two-way Elimination of Heterogeneity. (By title). Mr.S. S. Shrikhande. 

12. Designs for Animal Feeding Experiments. (By title). Mr. 8. S. Shrikhande. 

13. A Truncated Sequential Procedure for Interval Estimation, with Applications to the 
Poisson and Negative Binomial Distributions. (By title). Mr. D. Martin Sandelius, 
University of Washington and Uppsala University, Uppsala, Sweden. 

14. A Generalization of the Method of Maximum Likelihood: Estimating a Mizing Dis- 
tribution. (By title). Professor Herbert Robbins, University of North Carolina. 

15. Smallest Average Confidence Sets for the Simultaneous Estimation of k Normal 
leans. (By title). Mr. Raghu Raj Bahadur, University of North Carolina. 


or 


~I 


About eighty-five members of the two organizations attended a tea given by 
Professor and Mrs. Hotelling at the conclusion of the Saturday afternoon 
session. 


HERBERT ROBBINS 
Assistant Secretary 








